Kumo
About Kumo.ai
Kumo is building a next-generation AI platform that empowers organizations to make predictive decisions faster-without the overhead of traditional ML pipelines. Backed by Sequoia and led by ex-Airbnb, Pinterest, and LinkedIn leaders, we're scaling rapidly and looking for a
multi-cloud infrastructure leader
to architect and run the backbone of our AI platform.
This is
one of our most critical hires
- your work will directly power the models and applications our customers rely on every day. If you're passionate about
multi-cloud infrastructure ,
Kubernetes at scale , and building the infrastructure that powers the next generation of AI applications - we'd love to talk.
Why Kumo.ai? Work alongside
world-class engineers & scientists
(ex-Airbnb, Pinterest, LinkedIn, Stanford). Be a
foundational voice
in designing a platform powering enterprise-scale AI. Competitive Series B compensation package (salary + meaningful equity). The Opportunity - The Cloud Infrastructure team is responsible for managing and scaling our Kubernetes-based, multi-cloud AI platform across AWS, Azure, and GCP.
You will own the architecture, scalability, security, and operational excellence of this platform, building the foundation that supports massive multi-tenant clusters running Big Data and AI/ML workloads. Lead our
multi-cloud expansion
beyond AWS into Azure and GCP. Drive the design and implementation of
Kubernetes controllers, operators , and automation for scaling and reliability. Implement
Infrastructure as Code
(Terraform, Pulumi, Crossplane) and GitOps practices to deliver
commit-to-production automation
at scale. Partner closely with ML scientists, product engineers, and leadership to deliver
self-service tooling
and optimize infrastructure for machine learning workloads. You will be joining early enough to
shape the architecture, culture, and processes
that define our platform reliability and engineering velocity. What You'll Do
Architect & operate multi-cloud infrastructure
(AWS, Azure, GCP) to support large-scale AI workloads. Design, build and scale Kubernetes clusters
(EKS, AKS, GKE, Open Source) for high availability, performance, and cost efficiency. Build and maintain
Kubernetes controllers, operators , and automation for cluster lifecycle management, scaling, and workload scheduling. Implement
observability at scale
- metrics, logging, tracing - using tools like Prometheus, Grafana, and OpenTelemetry. Lead
IaC and GitOps
automation, ensuring consistent, repeatable provisioning and deployment workflows. Drive
security and compliance
policies (RBAC, tenant isolation, SOC2/GDPR readiness) into platform design. Partner with internal teams to enable
self-service cloud resources
and smooth
commit-to-production
pipelines. What You Bring
8+ years
building and operating
cloud-native infrastructure
in production. Proven multi-cloud experience
- designing and running workloads across AWS, Azure, and GCP. Kubernetes expertise
- 5+ years managing production clusters, with strong understanding of internals (schedulers, controllers, operators, CNI networking, security). Infrastructure-as-Code mastery
- Terraform, Pulumi, Crossplane, or similar. GitOps and workflow automation
experience (ArgoCD, Flux, Argo Workflows, or similar). Strong skills in monitoring and performance tuning for distributed systems. Proficiency in Go, Python, or Rust for automation tooling. Nice to Have
Experience in
optimizing, scaling, and maintaining multi-tenanted AI/ML clusters
across multiple cloud environments, ensuring high availability and performance. Familiarity with compliance standards (SOC2, ISO27001, GDPR). Contributions to
open-source cloud-native projects . Experience building customer-facing APIs or developer tooling.
$175,000 - $250,000 a year
We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.
Kumo is building a next-generation AI platform that empowers organizations to make predictive decisions faster-without the overhead of traditional ML pipelines. Backed by Sequoia and led by ex-Airbnb, Pinterest, and LinkedIn leaders, we're scaling rapidly and looking for a
multi-cloud infrastructure leader
to architect and run the backbone of our AI platform.
This is
one of our most critical hires
- your work will directly power the models and applications our customers rely on every day. If you're passionate about
multi-cloud infrastructure ,
Kubernetes at scale , and building the infrastructure that powers the next generation of AI applications - we'd love to talk.
Why Kumo.ai? Work alongside
world-class engineers & scientists
(ex-Airbnb, Pinterest, LinkedIn, Stanford). Be a
foundational voice
in designing a platform powering enterprise-scale AI. Competitive Series B compensation package (salary + meaningful equity). The Opportunity - The Cloud Infrastructure team is responsible for managing and scaling our Kubernetes-based, multi-cloud AI platform across AWS, Azure, and GCP.
You will own the architecture, scalability, security, and operational excellence of this platform, building the foundation that supports massive multi-tenant clusters running Big Data and AI/ML workloads. Lead our
multi-cloud expansion
beyond AWS into Azure and GCP. Drive the design and implementation of
Kubernetes controllers, operators , and automation for scaling and reliability. Implement
Infrastructure as Code
(Terraform, Pulumi, Crossplane) and GitOps practices to deliver
commit-to-production automation
at scale. Partner closely with ML scientists, product engineers, and leadership to deliver
self-service tooling
and optimize infrastructure for machine learning workloads. You will be joining early enough to
shape the architecture, culture, and processes
that define our platform reliability and engineering velocity. What You'll Do
Architect & operate multi-cloud infrastructure
(AWS, Azure, GCP) to support large-scale AI workloads. Design, build and scale Kubernetes clusters
(EKS, AKS, GKE, Open Source) for high availability, performance, and cost efficiency. Build and maintain
Kubernetes controllers, operators , and automation for cluster lifecycle management, scaling, and workload scheduling. Implement
observability at scale
- metrics, logging, tracing - using tools like Prometheus, Grafana, and OpenTelemetry. Lead
IaC and GitOps
automation, ensuring consistent, repeatable provisioning and deployment workflows. Drive
security and compliance
policies (RBAC, tenant isolation, SOC2/GDPR readiness) into platform design. Partner with internal teams to enable
self-service cloud resources
and smooth
commit-to-production
pipelines. What You Bring
8+ years
building and operating
cloud-native infrastructure
in production. Proven multi-cloud experience
- designing and running workloads across AWS, Azure, and GCP. Kubernetes expertise
- 5+ years managing production clusters, with strong understanding of internals (schedulers, controllers, operators, CNI networking, security). Infrastructure-as-Code mastery
- Terraform, Pulumi, Crossplane, or similar. GitOps and workflow automation
experience (ArgoCD, Flux, Argo Workflows, or similar). Strong skills in monitoring and performance tuning for distributed systems. Proficiency in Go, Python, or Rust for automation tooling. Nice to Have
Experience in
optimizing, scaling, and maintaining multi-tenanted AI/ML clusters
across multiple cloud environments, ensuring high availability and performance. Familiarity with compliance standards (SOC2, ISO27001, GDPR). Contributions to
open-source cloud-native projects . Experience building customer-facing APIs or developer tooling.
$175,000 - $250,000 a year
We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.