Logo
Darwin Recruitment

Staff ML Infrastructure Engineer

Darwin Recruitment, San Francisco, California, United States, 94199

Save Job

Location:

United States (West Coast preferred, remote considered) About the Company We are a fast-growing AI company building next-generation large language models at scale. Our mission is to bring powerful, reliable AI systems into production environments used by thousands of customers. We value technical excellence, deep collaboration, and engineers who thrive on solving real-world problems at scale.

Role Overview We are seeking a

Staff / Principal ML Infrastructure Engineer

to lead the design, deployment, and scaling of our large language model infrastructure. This role sits at the intersection of machine learning, systems engineering, and platform design, enabling teams to train, serve, and monitor models efficiently and reliably.

This is

not a prompt engineering role

– it is focused on building robust, production-grade ML infrastructure and operational pipelines.

Responsibilities

Design, implement, and maintain high-performance infrastructure for training and serving LLMs

Optimize model pipelines for efficiency, latency, and cost at scale

Collaborate with ML researchers, platform engineers, and product teams to deploy models safely into production

Build monitoring, alerting, and tooling to ensure reliability and observability of large-scale ML systems

Evaluate and integrate new frameworks, tools, and architectures to improve ML workflows

Provide technical leadership and mentorship to other engineers on the team

Qualifications

7+ years of software engineering experience, including 3+ years building production ML systems

Deep experience with distributed training and inference frameworks (e.g., PyTorch, JAX, TensorFlow)

Familiarity with model serving technologies and orchestration (e.g., Triton, Ray, Kubernetes)

Strong understanding of GPU/TPU infrastructure, performance optimization, and scalability challenges

Proven experience solving reliability, latency, and cost trade-offs in production ML systems

Excellent collaboration, communication, and problem-solving skills

Experience mentoring or leading engineering teams is a plus

Why You’ll Enjoy This Role

Work on cutting-edge LLM infrastructure at scale

Influence the design of systems that power real-world AI applications

Collaborate with some of the most talented engineers in AI

Flexible work arrangements and competitive compensation

Darwin Recruitment is acting as an Employment Agency in relation to this vacancy.

#J-18808-Ljbffr