Logo
Scale AI

AI Infrastructure Engineer, Core Infrastructure

Scale AI, Seattle, Washington, us, 98127

Save Job

AI Infrastructure Engineer, Core Infrastructure Design and build the next generation of foundational systems that power all ML Infrastructure compute at Scale.

Responsibilities

Design and maintain fault‑tolerant, cost‑efficient systems that manage compute allocation, scheduling, and autoscaling across clusters and clouds.

Build common abstractions and APIs that unify job submission, telemetry, and observability across serving and training workloads.

Develop systems for usage metering, cost attribution, and quota management, enabling transparency and control over compute budgets.

Improve reliability and efficiency of large‑scale GPU workloads through better scheduling, bin‑packing, preemption, and resource sharing.

Partner with ML engineers and API teams to identify bottlenecks and define long‑term architectural standards.

Lead projects end‑to‑end — from requirements gathering and design to rollout and monitoring — in a cross‑functional environment.

Qualifications

4+ years of experience building large‑scale backend or distributed systems.

Strong programming skills in Python, Go, or Rust, and familiarity with modern cloud‑native architecture.

Experience with containers and orchestration tools (Kubernetes, Docker) and Infrastructure as Code (Terraform).

Familiarity with schedulers or workload‑management systems such as Kubernetes controllers, Slurm, Ray, or internal job queues.

Understanding of observability and reliability practices (metrics, tracing, alerting, SLOs).

Track record of improving system efficiency, reliability, or developer velocity in production environments.

Nice to Haves

Experience with multi‑tenant compute platforms or internal PaaS.

Knowledge of GPU scheduling, cost modeling, or hybrid cloud orchestration.

Familiarity with LLM or ML training workloads.

Benefits & Compensation Compensation includes base salary, equity, and benefits. Base salary range for this role in San Francisco, New York, and Seattle:

$179,400—$310,500 USD . Benefits include health, dental, vision coverage, retirement, learning stipend, PTO, and commuter stipend.

We are an equal‑opportunity employer and comply with all applicable laws. We provide reasonable accommodations for applicants with disabilities.

Additional privacy and personal data handling statements as required.

#J-18808-Ljbffr