We are Genmo, a research lab dedicated to building open, state-of-the-art models for video generation towards unlocking the right brain of AGI. Join us in shaping the future of AI and pushing the boundaries of what's possible in video generation.
What Youll Do
Architect a multicluster infrastructure layer that spans clouds and onprem GPU fleets.
Automate deployment, rollout, and autoscaling workflows so new models reach production with zerodowntime.
Forecast & plan GPU capacity to meet latency SLOs while controlling cost.
Shape traffic policy for secure, lowlatency routing and global load balancing.
Instrument & observe deliver endtoend telemetry and debuggability for every model and cluster.
Standardize infrastructure automation, disasterrecovery, and CI / CD practices across teams.
Drive reliability through postincident review and continuous improvement.
Mentor & lead share distributedsystems best practices and influence the longterm roadmap.
You Have
BS / MS / PhD in CS, EE, or related field.
Fluency in a systems language (Go or Rust) plus Python.
Clear, concise communication and an ownership mindset.
Nice to Have
Experience tuning realtime protocols (WebRTC, gRPC, HTTP / 2) for highthroughput inference.
Multicloud or edge deployments spanning AWS, GCP, Azure, or baremetal providers.
Security and compliance for highperformance, distributed AI platforms.
Handson expertise with :
Kubernetes internals and multicluster operations
Infrastructureascode tools (Terraform, Helm) and GitOps workflows (Argo CD or Flux)
Servicemesh frameworks (Linkerd, Istio, or Envoy Gateway)
Observability stacks (Prometheus, Grafana, OpenTelemetry) and GPU telemetry (NVIDIA DCGM)
CI / CD tooling (GitHub Actions, BuildKit)
Genmo is an Equal Opportunity Employer. Candidates are evaluated without regard to age, race, color, religion, sex, disability, national origin, sexual orientation, veteran status, or any other characteristic protected by federal or state law. Genmo, Inc. is an E-Verify company and you may review the Notice of E-Verify Participation and the Right to Work posters in English and Spanish.
Create a job alert for this search
Senior Platform Engineer San Francisco, CA, US
#J-18808-Ljbffr