Site Reliability Engineer (Senior or Staff), Storage Layer Services (SLS)

Added
25 days ago
Type
Full time
Salary
Salary not provided

Related skills

azure linux aws networking python

๐Ÿ“‹ Description

  • Work on multi-tenant distributed storage systems for Atlas
  • Build reliability, resilient, self-healing services
  • Define metrics to detect incidents and health
  • Participate in a 24/7 on-call rotation for storage infra
  • Become an expert in infrastructure performance from app to kernel

๐ŸŽฏ Requirements

  • 6+ years in software development and distributed systems
  • Proficiency in Python, Go, or similar
  • Experience operating stateful storage or databases at scale; durability, consistency, and recovery trade-offs
  • Experience with containerization, especially Kubernetes
  • Expertise in cloud platforms AWS, GCP, or Azure
  • Understanding of Linux internals and networking (TCP/IP, DNS, TLS)

๐ŸŽ Benefits

  • Employee affinity groups
  • Fertility assistance
  • Generous parental leave policy
  • Accommodations for disabilities in application/interview
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest โ€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs โ†’