Build and maintain batch/real-time data pipelines (Snowflake, PySpark, Delta Lake, Kafka).
Implement data quality checks, schema validation, and alerting across pipelines.
Migrate legacy ETL/DWH to cloud-native AWS/Azure with latency and cost gains.
Maintain CI/CD pipelines: tests, deployment, rollback; IaC with Terraform & GitHub Actions.
Build end-to-end retrieval infra: ingestion, embeddings, vector stores.
Tune chunking, metadata filtering, and re-ranking for precision, recall, latency.

🎯 Requirements

7+ years data engineering on cloud services.
2+ years production AI/ML or LLM data infra.
Deep expertise: Python, PySpark, Snowflake, Delta Lake, Kafka.
Hands-on with vector stores and retrieval infra in production RAG.
MLOps experience: MLflow, AI CI/CD, evaluation, monitoring.
Strong grounding in data governance, quality frameworks, and compliance.
Strong communication and collaboration skills.

Apply on employer's website

This employer gathers applications via their own applicant tracking system.

You will be redirected to an external application form.

Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.

Activate JobCopilot