Added
1 day ago
Type
Full time
Salary
Salary not provided

Related skills

cloud linux python distributed systems go

๐Ÿ“‹ Description

  • Partner with Ads Engineering to improve reliability, scalability, and operational excellence of ad systems.
  • Design, build, and maintain infrastructure, tooling, and automation for reliability.
  • Improve observability through monitoring, alerting, tracing, logging, and dashboards.
  • Participate in on-call rotations and lead incident response for production systems.
  • Run root cause analysis and drive corrective actions after incidents.
  • Drive adoption of SRE practices including SLIs, SLOs, capacity planning, and operational readiness reviews.

๐ŸŽฏ Requirements

  • 5+ years in SRE/Infra or related roles operating large distributed systems.
  • Strong experience supporting high-traffic, user-facing production environments.
  • Good understanding of distributed systems, networking, Linux, and cloud-native architectures.
  • Good programming skills in Go, Python, or similar.
  • Demonstrated ability to troubleshoot complex issues across applications, infrastructure, networking, and services.
  • Experience with observability platforms, monitoring, alerting, and incident response.

๐ŸŽ Benefits

  • Global benefits for workspace, development, and caregiving support.
  • Family Planning Support.
  • Gender-affirming Care.
  • Mental Health and Coaching Benefits.
  • Private pension plan with employer matching.
  • 100% employer-sponsored medical plan.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest โ€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs โ†’