Logo
Darwin Recruitment

Staff LLM Systems Engineer

Darwin Recruitment, San Francisco, California, United States, 94199

Save Job

Location:

United States (West Coast preferred, remote considered) About the Company We are a rapidly growing AI company delivering large language models at scale. Our mission is to ensure models not only perform well in research but also serve real-world applications reliably and efficiently. We are looking for engineers who enjoy solving high-scale inference and systems challenges.

Role Overview We are seeking a

Senior / Staff LLM Systems Engineer

to lead the development, optimization, and deployment of large language model inference pipelines. This role focuses on high-throughput, low-latency serving and production reliability, bridging ML research and platform engineering.

This is

not a training-focused role

– the emphasis is on

serving models at scale, optimizing systems, and enabling production ML reliability .

Responsibilities

Design, implement, and optimize inference pipelines for large language models

Improve throughput and latency of model serving in production environments

Collaborate closely with infrastructure, platform, and ML research teams to ensure smooth deployment

Build monitoring, observability, and alerting systems for inference performance and reliability

Identify and solve scaling challenges across GPUs, TPUs, or distributed environments

Evaluate and adopt new technologies, frameworks, and architectures to improve inference efficiency

Mentor other engineers and contribute to technical strategy for production ML systems

Qualifications

5+ years of software engineering experience, including hands-on ML systems experience

Strong background in distributed systems, performance tuning, and low-latency architectures

Experience with model serving frameworks (e.g., Triton, vLLM, Ray, TorchServe)

Familiarity with GPU/TPU infrastructure, multi-node deployment, and system-level optimization

Understanding of ML workloads and trade-offs between accuracy, latency, and cost

Proven ability to deliver production-grade ML systems at scale

Excellent collaboration and problem-solving skills

Why You’ll Enjoy This Role

Work on cutting-edge LLM inference systems at scale

Solve technically challenging, high-impact engineering problems

Collaborate with top ML researchers and platform engineers

Competitive compensation and flexible work arrangements

Darwin Recruitment is acting as an Employment Agency in relation to this vacancy.

Reece Waldon

#J-18808-Ljbffr