Added
1 day ago
Type
Full time
Salary
Salary not provided

Related skills

bigquery python kubernetes go kafka

๐Ÿ“‹ Description

  • Lead reliability initiatives across Ads domains (serving, auctions, targeting, reporting, measurement, billing).
  • Partner with engineering leadership to improve reliability, scalability, and developer productivity across Ads.
  • Drive architecture reviews and influence decisions on revenue-critical systems.
  • Design and build platforms, tooling, and automation to improve reliability at scale.
  • Participate in on-call rotations and coordinate incident response during major production events.
  • Identify systemic reliability risks and drive long-term platform resilience.

๐ŸŽฏ Requirements

  • 8+ years of experience in Site Reliability or Infrastructure Engineering on large-scale distributed systems.
  • Strong experience supporting high-traffic production environments.
  • Deep understanding of distributed systems, networking, Linux, and cloud-native architectures.
  • Experience designing highly available systems with strong operational practices.
  • Strong observability skills: metrics, logging, tracing, and alerting.
  • Proficiency in Go, Python, or similar.

๐ŸŽ Benefits

  • Global benefits: workspace, development, and caregiving support
  • Family Planning Support
  • Mental Health and Coaching Benefits
  • Private medical, dental, and vision coverage
  • Generous paid parental leave
  • Flexible vacation and paid volunteer time off
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest โ€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs โ†’