Staff Site Reliability Engineer

Added
9 days ago
Type
Full time
Salary
Salary not provided

Related skills

gitops terraform helm python kubernetes

๐Ÿ“‹ Description

  • Design and operate large-scale cloud infrastructure.
  • On-call rotation for highly available systems.
  • Lead incident response and post-incident reviews.
  • Define and improve SLIs, SLOs, and error budgets.
  • Partner with engineers to improve availability, scalability, and resilience.
  • Build observability via metrics, logs, traces, dashboards, and alerting.

๐ŸŽฏ Requirements

  • Strong experience operating large-scale production services in AWS and/or GCP.
  • Deep Kubernetes production expertise (networking, storage, scheduling, scaling).
  • IaC with Terraform and Helm proficiency.
  • Proficient in Go and Python software engineering.
  • Experience building automation and internal platforms with GitOps/CI-CD.
  • Experience with distributed data stores (PostgreSQL, Redis, OpenSearch).

๐ŸŽ Benefits

  • Supporting your well-being
  • Driving social impact
  • Developing talent and fostering connection
  • Immersive onboarding across multiple offices
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest โ€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs โ†’