Platform Reliability Engineer

Added
1 hour ago
Type
Full time
Salary
Salary not provided

Related skills

node.js pagerduty aws grafana prometheus

๐Ÿ“‹ Description

  • Operate and improve our monitoring stack (Prometheus, Grafana, OpenTelemetry)
  • Instrument services and shape actionable alerts in production
  • Define incident runbooks, status pages, and post-incident learning
  • Collaborate with platform teams to raise reliability standards
  • Document tooling and processes engineers actually use
  • Automate tasks and improve developer workflows

๐ŸŽฏ Requirements

  • Hands-on experience selecting production signals reflecting customer experience
  • Incidents and alerts handling from detection to resolution
  • Hands-on with Prometheus, Grafana, OpenTelemetry, and PagerDuty
  • Read/write code across services and pipelines
  • Knowledge of post-incident culture (blame-free, learning-focused)
  • Ability to write concise guidance and drive decisions

๐ŸŽ Benefits

  • Full-time in Prague or Brno with remote option
  • Flexible hours and work-life balance
  • Stock options and profit sharing
  • Generous hardware budget
  • Epic team buildings and offsites
  • Free lunches and snacks at the office
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest โ€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs โ†’