CoreWeave is a specialized cloud provider, delivering a massive scale of GPU compute resources on top of the industry’s fastest and most flexible infrastructure. CoreWeave builds cloud solutions for compute intensive use cases — VFX and rendering, machine learning and AI, batch processing, and Pixel Streaming — that are up to 35 times faster and 80% less expensive than the large, generalized public clouds. Learn more at www.coreweave.com.
In this role, you will manage the team responsible for designing, developing, and optimizing CoreWeave’s bare-metal systems. The team’s primary responsibilities include maintaining a custom Linux kernel, various OS images (Ubuntu-based), the virtualization stack (kubevirt/qemu/vfio), and the container/pod runtime stack (containerd/nydus/kubelet). You will be responsible for establishing and maintaining the team’s roadmap, creating a hiring plan to grow the team and collaborating closely with cross-functional teams.
Our Team’s Stack:
- Linux Kernel (custom build, currently tracking Ubuntu HWE)
- Intel/AMD CPUs, Nvidia GPUs, DPUs, Infiniband and Ethernet NICs
- KubeVirt, QEMU, SR-IOV, vfio-pci
- Ubuntu 22.04
- Containerd, Kubelet
Your Responsibilities:
- Build and lead distributed teams who are focused on rapidly delivering high quality results
- Identify opportunities to improve process flows and speed through systems and tools
- Develop measurable goals for teams, and ensure alignment across teams to deliver
- Facilitate constructive communication across all teams, and provide coaching and counseling to team leads by mentoring, one-on-one meetings, and regular performance feedback
- Work closely with recruiting team to attract and evaluate talent
Team Responsibilities:
- Develop and maintain tooling to build custom Linux kernels and stateless OS images
- Automate packaging of critical components (drivers, microcode, components with out-of-tree patches, etc)
- Serve as a senior point of contact for hardware issue escalation and troubleshooting
- Collaborate with cross-functional teams to define Linux and OS requirements, specifications, and system architecture
- Analyze and optimize the performance of bare-metal and virtualized systems, identify bottlenecks, and propose improvements for enhanced efficiency
Requirements:
- A passion for leading emerging teams in a fast-paced environment
- Experience coaching and managing other leaders, and senior level staff that work on challenging issues and solutions that are scalable and executed rapidly
- Prior success in creating processes and developing cross-disciplinary collaboration between engineering, operations, support, sales and product groups
- Enthusiasm for staffing, interviewing, growing and retaining talent
- Passionate about operational excellence
- Excellent communication and organization skills to drive cross-disciplinary collaboration
- Exceptional verbal and written communication skills
- Exceptional attention to detail and follow-up
- Start-up with growth experience preferred
Nice-to-haves:
- Prior experience managing distributed teams of Linux kernel developers / maintainers
- Prior experience working upstream with large open source projects (Linux kernel, kubernetes, etc)