Added
5 days ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

docker terraform github actions aws lambda

๐Ÿ“‹ Description

  • Own availability, latency, and performance targets for AWS-based platform services.
  • Design and implement monitoring, alerting, and observability across the stack.
  • Lead incident response, RCA, and postmortems for platform outages.
  • Define and track SLOs/SLAs for core platform primitives (RAG pipelines, agent orchestration, model access).
  • Build runbooks, disaster recovery procedures, and operational docs.
  • Own CI/CD pipelines for AI platform components and data pipelines.

๐ŸŽฏ Requirements

  • 3+ years in DevOps, SRE, or platform engineering.
  • Hands-on AWS with ECS, Lambda, S3, RDS, Redshift, CloudWatch, IAM, and VPC.
  • Experience with IaC tools: Terraform or AWS CDK.
  • Strong CI/CD experience with GitHub Actions.
  • Experience with Docker and Kubernetes.
  • Familiarity with AI/ML infra patterns and observability tools.

๐ŸŽ Benefits

  • Competitive health plans.
  • Paid time off and holidays.
  • 401(k) with company match.
  • Other company-sponsored programs.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest โ€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs โ†’