Application Software Engineer, Inference

Added
2 days ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

grpc rust docker python kubernetes

πŸ“‹ Description

  • Develop highly reliable, high-throughput inference systems across SpaceX
  • Architect scalable distributed infra for model serving: load balancing, auto-scaling
  • Optimize latency under production workloads with GPU kernels, quantization
  • Build high-concurrency serving with uptime, low tail latency, observability
  • Own end-to-end components such as routing, SDKs, rate limiting, scaling
  • Benchmark and accelerate inference engines (SGLang, vLLM, TensorRT-LLM)

🎯 Requirements

  • Bachelor's degree in CS/engineering/math or 2+ years software experience
  • Experience designing and maintaining scalable distributed systems
  • 1+ year full-stack or backend development with production systems
  • 1+ year Rust or C++
  • Experience with LLM inference engines and serving frameworks (SGLang, vLLM, Triton, TensorRT-LLM)
  • Deep low-level systems programming: GPU kernels, batching, caching, quantization

🎁 Benefits

  • Medical, vision, and dental coverage; 401(k); disability and life insurance
  • Paid time off: 3 weeks vacation and 10+ holidays; paid sick leave
  • Employee stock purchase plan and long-term incentives
  • Parental leave and other discounts and perks
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs β†’