Lumex Talent

ML Inference Software Engineer

Lumex Talent, Palo Alto, California, United States, 94306

A well‑funded AI startup is building a platform that lets anyone generate fully interactive 2D/3D worlds from natural language instantly. Backed by a

$28M seed round

and founded by engineers from

Stanford, NVIDIA, Meta, and Epic Games , they’re combining multimodal reasoning, simulation, graphics, and real‑time generation into one unified system.

They’re hiring a

Senior ML Infrastructure Engineer

to take ownership of GPU performance, model serving, and end‑to‑end inference optimization.

Base pay range:

$200,000.00/yr - $500,000.00/yr

Location: Mountain View, CA

What You’ll Do

Improve model throughput, latency, and cost by 2–10×

Optimize the GPU stack using CUDA/Triton kernels, FlashAttention, paged attention, and CUDA Graphs

Build and refine inference systems with TensorRT-LLM, Triton Inference Server, vLLM/TGI

Own profiling, optimization, deployment, and validation of all core inference workflows

Work closely with research and engine teams to support real‑time world generation and simulation

What They’re Looking For

2–3+ years in ML infrastructure, GPU systems, or LLM inference

Strong background in GPU performance optimization

Experience with high‑performance serving stacks and distributed ML systems

Comfortable operating in a fast‑paced, high‑ownership startup environment

Why This Role Matters This role directly shapes how fast their models run, how the platform scales, and how creators and agents interact inside generated worlds in real time.

#J-18808-Ljbffr