Luminai

Senior Software Engineer, Infrastructure

Luminai, San Mateo, California, United States, 94409

About Luminai Nearly every organization in the world relies on complex manual work to carry out critical internal processes. These are processes that keep the world going — enrolling patients in a hospital, underwriting loans inside a bank, or processing new transactions for an airline. Yet most companies don’t have enough resources to properly automate these tasks and are stuck in manual, decades old way of doing things.

At Luminai, we develop technology to automate long-form organization wide workflows of any complexity easily and safely using AI. Luminai serves some of the world’s most critical organizations in sectors like Healthcare, Finance, and Telecommunication to delegate mission-critical workflows that previously required hands-on human involvement, over to autonomous AI systems. Our approach combines frontier AI development, with a purpose built workflow execution engine to achieve this goal.

We've raised significant amounts of capital (including some un-announced) from many of the best Silicon Valley VCs: General Catalyst, YCombinator, and investors including Kevin Weil (Chief Product Officer at OpenAI), Arash Ferdowsi (co‑founder of Dropbox), Katie Stanton (former VP Global Media, Twitter) and CEOs of companies including Flexport, Notion, Front, Ramp and Twitch.

About the Role

We’re looking for a Senior Platform Engineer to join our Infrastructure team and help build and scale a self‑hosted, cloud‑native platform for both production and air‑gapped/on‑prem environments.

You will work closely with our existing senior engineers (who have built the current platform from the ground up) to evolve our AWS/Azure‑based Kubernetes infrastructure, optimize CI/CD workflows, maintain GitOps pipelines, and ensure local/dev/prod environment parity.

This is a high‑ownership role. You’ll be expected to contribute architecture‑level decisions, write production‑grade code, and own infrastructure that supports mission‑critical services.

What You’ll Be Working On

Extend and maintain a multi‑cluster AWS/Azure EKS setup via Terraform modules (EKS, VPC, IRSA, ECR, IAM, S3, KMS, etc.)

Develop and maintain Kubernetes Helm charts and distroless Docker images for self‑hosted deployments

Own and evolve the platform stack:

cert‑manager, external‑dns, ingress‑nginx, istio ambient, minio, karpenter, otel/signoz, velero, pomerium, temporal, redis, etc.

Support and optimize CI/CD systems with GitHub Actions, custom self‑hosted runners, and Skaffold‑based PR environments

Maintain a robust GitOps deployment model using ArgoCD, SOPS, and external-secrets

Contribute to our local development experience using k3d clusters, helping teams onboard and maintain environment parity

Improve platform observability and reliability through Signoz, Pyroscope, custom dashboards, and alerts

Enable and support air‑gapped / on‑prem installations by ensuring self‑hostability and minimal third‑party dependencies

You Should Have

5+ years of experience in DevOps, SRE, or Platform Engineering roles

Proven expertise with Kubernetes and production‑grade Helm‑based deployments

Strong experience building infrastructure with Terraform, including complex module systems and environment separation

Experience deploying and managing GitOps pipelines using ArgoCD or Flux

Proficiency in designing secure, scalable CI/CD pipelines with tools like GitHub Actions, Skaffold, and Docker build optimizations

Deep understanding of networking, ingress controllers, TLS, and service‑to‑service communication (esp. with Istio or ambient mesh)

Experience with cloud‑native observability: tracing, metrics, logging (OpenTelemetry, Signoz, Pyroscope, Prometheus, etc.)

Practical knowledge of security best practices: IRSA, KMS, SOPS, secrets management

Comfortable with air‑gapped / self‑hosted system constraints

Ability to write clean, modular Go or Python when needed for controllers/operators/scripts

Bonus Experience

Building or maintaining custom Kubernetes operators with CRDs (esp. for ephemeral workloads)

Prior experience migrating from SaaS to self‑hosted alternatives

Exposure to on‑premise infrastructure and air‑gapped delivery pipelines

Familiarity with key developer tooling like Keycloak, Temporal, MinIO, and Pomerium

Experience using or maintaining self‑hosted GitHub Actions runners with docker‑in‑docker caching setups

#J-18808-Ljbffr