Site Reliability Engineer

Added
4 hours ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

grafana prometheus kubernetes distributed systems go

๐Ÿ“‹ Description

  • Lead scaling of operational resilience and observability
  • Own stability, debugging workflows, and incident response
  • Design tools to turn chaos into clarity and reduce incidents
  • Improve developer focus and system uptime through reliability
  • Shape how reliability is practiced with high impact and trust
  • Drive proactive operations and internal reliability tooling

๐ŸŽฏ Requirements

  • 3+ years debugging production systems (logs, traces, incidents)
  • Strong problem solving in unfamiliar backend codebases
  • Strong Go and Kubernetes experience
  • Familiarity with Grafana, Prometheus, Sentry
  • Clear, calm communication under pressure during live incidents
  • Experience with distributed systems or services at scale

๐ŸŽ Benefits

  • High-growth AI startup backed by top investors
  • Fast growth with ownership of projects
  • Work with a world-class engineering team
  • Flexible culture and collaborative environment
  • Learning and career advancement opportunities
  • Impactful work that ships to enterprise-scale systems
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest โ€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs โ†’