Operations Engineering Manager, Fleet Reliability

Added
4 days ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

sre capacity planning observability incident management fleet management

📋 Description

  • Build and lead a 24/7 team of reliability and observability engineers.
  • Lead provisioning, validation, and troubleshooting of server nodes.
  • Promote automation with event-driven remediation.
  • Provide 24/7 engineering support for high-criticality node delivery.
  • Drive onboarding, documentation, enablement, and performance management.
  • Shape culture and communication across CoreWeave teams.

🎯 Requirements

  • Seven+ years in software or infra engineering with 2+ years in leadership.
  • Experience with SRE fundamentals, incident management, observability, and change management.
  • Champion automation to improve reliability and cross-team tooling.
  • Enjoy helping others grow and influencing partners, peers, and leaders.

🎁 Benefits

  • Medical, dental, and vision insurance—100% paid by CoreWeave.
  • 401(k) with generous employer match.
  • Flexible PTO.
  • Tuition Reimbursement.
  • Employee Stock Purchase Program (ESPP) eligibility.
  • Mental Wellness Benefits through Spring Health.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs →