Lambda

Forward Deployed Engineer (Site Reliability / Infrastructure)

Lambda, Seattle, Washington, us, 98127

Forward Deployed Engineer (Site Reliability / Infrastructure) Join us as a Forward Deployed Engineer at Lambda, a leader in AI cloud infrastructure serving thousands of customers.

Base Pay Range $240,000.00/yr – $425,000.00/yr

About the Role We’re looking for a Forward Deployed Engineer to embed directly with a strategic customer, serving as the technical bridge between Lambda and their team. You’ll work where model performance matters most, delivery timelines are urgent, and ambiguity is the default state. Your job is to map problems, structure delivery paths, and ship solutions that create measurable impact.

What You’ll Do

Embed on-site with a named strategic customer, becoming an extension of their team

Act as the primary technical liaison between Lambda and the customer organization

Navigate ambiguous requirements to identify root problems and define clear technical solutions

Scope, sequence, and build full-stack solutions that deliver measurable business value

Design and implement infrastructure optimizations for AI/ML workloads at scale

Debug complex distributed systems issues across the infrastructure stack

Ship iteratively and learn fast, adjusting approach based on customer feedback and results

Identify reusable patterns from customer engagements that can scale across Lambda's customer base

Surface field intelligence that influences Lambda's product roadmap

Document and share learnings to elevate the capabilities of the broader team

Represent Lambda with executive presence in high‑stakes customer interactions

Location & Work Arrangement This position requires presence in our upcoming Bellevue office location or on‑site with strategic customers 4 days per week. Lambda’s designated work from home day is currently Tuesday.

About You

6+ years of experience in SRE, software engineering, or a similar role, with deep knowledge of running Linux clusters and systems

Strong programming skills in Go and Python; experience with GitOps (e.g., ArgoCD), Helm, and Kubernetes operators

Proven experience operating Kubernetes clusters in production environments (on‑prem, EKS, GKE, or similar)

Hands‑on experience with AI/ML workload management tools (Volcano, Kubeflow, or similar)

Familiarity with observability tools like Prometheus, Grafana, FluentBit, and CI/CD pipelines

Proven experience provisioning Kubernetes using tools such as kubeadm, Cluster API, or similar

Excellent communication skills with the ability to translate technical complexity for diverse audiences

Executive presence and ability to represent Lambda in customer‑facing situations

Comfort operating in ambiguous environments with competing priorities

Strong bias for action and shipping iteratively

Nice to Have

Deep Kubernetes expertise: CRDs, CSI, CNI, Kubernetes Operator coding experience

Exposure to HPC clusters, AI/ML workloads, or large‑scale GPU clusters

Hybrid or multi‑cloud Kubernetes environment experience

Contributions to CNCF projects or Kubernetes SIGs

Why Join Us

Work on cutting‑edge managed Kubernetes platforms for AI/ML workloads

Influence the platform roadmap and help shape operations and reliability best practices

Collaborate with a highly skilled engineering team

Opportunity to mentor and grow within a fast‑growing, technology‑driven environment

Salary Range Information The annual salary range for this position has been set based on market data and other factors. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description.

Equal Opportunity Employer Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation, identity, genetic information, veteran status, citizenship, or any other factors prohibited by law.

#J-18808-Ljbffr