Unusual

11-50 employees
7 jobs posted

View company profile →

Please mention that you found this job on tryremotely.com. Thanks & good luck!

Tired of Manually Applying to Jobs?

Let JobCopilot do it for you!

Set your preferences and let your AI copilot handle the job search while you sleep.

Applies for jobs that actually match your skills
Tailors your resume and cover letter automatically
Works 24/7—so you don't have to

Activate JobCopilot

Follow us on LinkedIn!

Software Engineer, ML Serving

Added

7 days ago

Location

🇺🇸 San Francisco

Type

Full time

Salary

Salary not provided

Related skills

docker kubernetes nvidia triton dynamo vllm

📋 Description

Architect and implement the TTS serving infra (GPU-based).
Optimize models for multi-node/disaggregated serving.
Ensure NVIDIA hardware compatibility (Hopper–Blackwell) for on-prem/cloud.
Build CI/CD workflows for the serving pipeline.
Maintain site reliability: on-call, monitoring, alerts, observability.
Provision resources and manage GPU fleet costs.

🎯 Requirements

Hands-on with real-time multinode ML serving infra; Dynamo/Triton/vLLM or equivalent.
Experience with distributed or disaggregated model serving (Tensor Parallel, Pipeline Parallel, or equivalent).
Strong cloud fundamentals: Linux, networking, Docker, Kubernetes.
IaC experience with Terraform, Packer, or equivalents.
On-call is part of the job; productivity reliability is a shared responsibility.
SRE, DevOps, or platform engineering background is a plus.

🎁 Benefits

Build the serving infra behind a voice AI company.
Direct collaboration with inference, platform, and ML teams.
Your work shapes what customers deploy at scale.
Meaningful equity upside at an early stage.
High ownership, high standards, low bureaucracy.
SF / Bay Area presence.

Apply on employer's website

This employer gathers applications via their own applicant tracking system.

You will be redirected to an external application form.

Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.

Activate JobCopilot