Software Engineer, ML Serving

Added
7 days ago
Type
Full time
Salary
Salary not provided

Related skills

docker kubernetes nvidia triton dynamo vllm

📋 Description

  • Architect and implement the TTS serving infra (GPU-based).
  • Optimize models for multi-node/disaggregated serving.
  • Ensure NVIDIA hardware compatibility (Hopper–Blackwell) for on-prem/cloud.
  • Build CI/CD workflows for the serving pipeline.
  • Maintain site reliability: on-call, monitoring, alerts, observability.
  • Provision resources and manage GPU fleet costs.

🎯 Requirements

  • Hands-on with real-time multinode ML serving infra; Dynamo/Triton/vLLM or equivalent.
  • Experience with distributed or disaggregated model serving (Tensor Parallel, Pipeline Parallel, or equivalent).
  • Strong cloud fundamentals: Linux, networking, Docker, Kubernetes.
  • IaC experience with Terraform, Packer, or equivalents.
  • On-call is part of the job; productivity reliability is a shared responsibility.
  • SRE, DevOps, or platform engineering background is a plus.

🎁 Benefits

  • Build the serving infra behind a voice AI company.
  • Direct collaboration with inference, platform, and ML teams.
  • Your work shapes what customers deploy at scale.
  • Meaningful equity upside at an early stage.
  • High ownership, high standards, low bureaucracy.
  • SF / Bay Area presence.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs →