Added
1 day ago
Type
Full time
Salary
Salary not provided

Related skills

cloud linux python go incident response

๐Ÿ“‹ Description

  • Partner with Ads Engineering to improve reliability, scalability, and operational excellence across ad-serving systems.
  • Design, build, and maintain infrastructure, tooling, and automation to boost reliability and productivity.
  • Improve observability through monitoring, alerting, tracing, logging, and dashboards.
  • Participate in on-call rotations and lead incident response efforts for critical production systems.
  • Run root cause analysis and drive corrective actions following incidents.
  • Collaborate with software engineers throughout the service lifecycle, from design reviews through production operations.

๐ŸŽฏ Requirements

  • 5+ years in SRE, infrastructure, or related roles on large-scale distributed systems.
  • Strong experience supporting high-traffic, user-facing production environments.
  • Good understanding of distributed systems, networking, Linux, and cloud native architectures.
  • Good programming skills in Go, Python, or similar.
  • Demonstrated ability to troubleshoot complex issues across applications, infrastructure, networking, and services.
  • Experience with observability platforms, monitoring, alerting, and incident response.

๐ŸŽ Benefits

  • Global benefits supporting workspace, development, and caregiving
  • Family Planning Support
  • Gender-Affirming Care
  • Mental Health & Coaching Benefits
  • Private Medical, Dental, and Vision Benefits
  • Personal Retirement Savings Account with matching
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest โ€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs โ†’