Kumo
Software Engineer Lead – Cloud Infrastructure
At Kumo, we are building the infrastructure layer for the next generation of enterprise AI — a platform that lets organizations turn their data into predictive intelligence instantly, without the heavy lifting of traditional ML pipelines. We have also built our own Relational Foundation Model that can provide predictions in seconds – no training, straight to business value!
Join a dynamic, rapidly expanding team of innovators from top-tier companies like Airbnb, LinkedIn, Pinterest, and Stanford, supported by Sequoia Capital. We’re on the front lines of AI, solving some of its most challenging problems, and have delivered over $500M+ in tangible value to industry giants like Reddit, DoorDash, and Databricks.
The Opportunity We’re hiring a Lead / Staff+ Infrastructure Engineer to own the architecture, reliability, and evolution of Kumo’s multi‑tenant AI platform. This is a hands‑on leadership role: design high‑leverage systems, make critical architectural decisions, mentor engineers, drive cross‑functional roadmaps, and spend an equal amount of time writing code and running production services.
What You’ll Own
Set the technical vision and roadmap for Kumo’s multi‑tenant infrastructure across AWS, Azure, and GCP, balancing scalability, reliability, cost, and security.
Lead architecture and design for critical systems: Kubernetes‑based multi‑tenancy, real‑time inference clusters, training pipelines, and CI/CD for large ML workloads.
Hands‑on implementation: build and evolve IaC, GitOps flows, cluster autoscaling, and automation that reduce toil and accelerate developer productivity.
Define and drive SLOs, SLIs, and capacity planning; lead incident response, post‑mortems, and systemic remediation.
Own cost optimization at scale — from resource scheduling to spot/commit strategies and cross‑cloud lifecycle management.
Mentor and grow engineers: set standards for architecture reviews, design docs, code quality, and operational excellence.
Hire and help scale the team — participate in recruiting, interviewing, and onboarding top‑tier infrastructure talent.
What You Bring
5‑8+ years building and operating production cloud‑native infrastructure; proven track record leading infrastructure initiatives end‑to‑end.
Deep, practical experience with Kubernetes at scale (multi‑tenant environments, cluster federation, or large fleet operations).
Strong multi‑cloud operational experience (designing and running services across AWS/Azure/GCP) and cloud cost management.
Demonstrated systems design skills for distributed systems, making architectural trade‑offs and comfortable shipping code in a high‑velocity environment (Python, Go, or similar) and reviewing complex PRs.
Proficiency in Go, Python, Rust or similar languages for automation tooling.
Excellent communicator: able to influence across engineering, ML science, product, and leadership — and to write clear design docs and trade‑off analyses.
Nice to Have
Experience building infrastructure for ML/AI platforms or relational foundation models.
Background with Spark or large‑scale data processing platforms (managed or self‑hosted).
Familiarity with Kubernetes operators, controllers, CRDs, or service mesh patterns.
Expertise with Infrastructure‑as‑Code (Terraform/Pulumi) and GitOps (ArgoCD, Flux, Argo Workflows) in production.
Experience with tenant isolation, zero‑trust identity models, and cloud security/compliance frameworks.
Prior experience building and scaling an infrastructure team (e.g., hiring, mentoring, org design).
Equal Employment Opportunity We are an equal‑opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.
Compensation The base pay range for this role is $170,000 – $260,000 per year.
Seniority level
Mid‑Senior level
Employment type
Full‑time
Job function
Engineering and Information Technology
Software Development
Referrals increase your chances of interviewing at Kumo by 2×.
#J-18808-Ljbffr
Join a dynamic, rapidly expanding team of innovators from top-tier companies like Airbnb, LinkedIn, Pinterest, and Stanford, supported by Sequoia Capital. We’re on the front lines of AI, solving some of its most challenging problems, and have delivered over $500M+ in tangible value to industry giants like Reddit, DoorDash, and Databricks.
The Opportunity We’re hiring a Lead / Staff+ Infrastructure Engineer to own the architecture, reliability, and evolution of Kumo’s multi‑tenant AI platform. This is a hands‑on leadership role: design high‑leverage systems, make critical architectural decisions, mentor engineers, drive cross‑functional roadmaps, and spend an equal amount of time writing code and running production services.
What You’ll Own
Set the technical vision and roadmap for Kumo’s multi‑tenant infrastructure across AWS, Azure, and GCP, balancing scalability, reliability, cost, and security.
Lead architecture and design for critical systems: Kubernetes‑based multi‑tenancy, real‑time inference clusters, training pipelines, and CI/CD for large ML workloads.
Hands‑on implementation: build and evolve IaC, GitOps flows, cluster autoscaling, and automation that reduce toil and accelerate developer productivity.
Define and drive SLOs, SLIs, and capacity planning; lead incident response, post‑mortems, and systemic remediation.
Own cost optimization at scale — from resource scheduling to spot/commit strategies and cross‑cloud lifecycle management.
Mentor and grow engineers: set standards for architecture reviews, design docs, code quality, and operational excellence.
Hire and help scale the team — participate in recruiting, interviewing, and onboarding top‑tier infrastructure talent.
What You Bring
5‑8+ years building and operating production cloud‑native infrastructure; proven track record leading infrastructure initiatives end‑to‑end.
Deep, practical experience with Kubernetes at scale (multi‑tenant environments, cluster federation, or large fleet operations).
Strong multi‑cloud operational experience (designing and running services across AWS/Azure/GCP) and cloud cost management.
Demonstrated systems design skills for distributed systems, making architectural trade‑offs and comfortable shipping code in a high‑velocity environment (Python, Go, or similar) and reviewing complex PRs.
Proficiency in Go, Python, Rust or similar languages for automation tooling.
Excellent communicator: able to influence across engineering, ML science, product, and leadership — and to write clear design docs and trade‑off analyses.
Nice to Have
Experience building infrastructure for ML/AI platforms or relational foundation models.
Background with Spark or large‑scale data processing platforms (managed or self‑hosted).
Familiarity with Kubernetes operators, controllers, CRDs, or service mesh patterns.
Expertise with Infrastructure‑as‑Code (Terraform/Pulumi) and GitOps (ArgoCD, Flux, Argo Workflows) in production.
Experience with tenant isolation, zero‑trust identity models, and cloud security/compliance frameworks.
Prior experience building and scaling an infrastructure team (e.g., hiring, mentoring, org design).
Equal Employment Opportunity We are an equal‑opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.
Compensation The base pay range for this role is $170,000 – $260,000 per year.
Seniority level
Mid‑Senior level
Employment type
Full‑time
Job function
Engineering and Information Technology
Software Development
Referrals increase your chances of interviewing at Kumo by 2×.
#J-18808-Ljbffr