Voiceflow
AI-Infrastructure-Engineer
Location:
San Francisco (Onsite) Type:
Full-time Start Date:
ASAP
What You'll Do Design and build infrastructure for deploying, scaling, and managing AI/ML workloads
Develop automation for GPU cluster provisioning, configuration, and orchestration
Build systems for hardware-aware model deployment and inference optimization
Create tooling for AI infrastructure observability, debugging, and performance tuning
Work on integration between hardware intelligence and ML frameworks
Collaborate with customers deploying large-scale AI systems in production
Optimize resource utilization across heterogeneous compute (GPUs, TPUs, custom accelerators)
What You Bring Strong experience with:
GPU cluster management and orchestration (SLURM, Kubernetes, Ray)
ML infrastructure and frameworks (PyTorch, TensorFlow, JAX, NVIDIA stack)
Distributed training and inference systems
Container orchestration for ML workloads (Docker, Kubernetes, KubeFlow)
Linux systems programming and performance optimization
Python and systems scripting
Familiarity with:
Hardware architectures for AI (NVIDIA GPUs, AMD GPUs, custom accelerators)
High-performance networking for distributed ML (NCCL, InfiniBand, RoCE)
Model serving infrastructure (Triton, vLLM, TensorRT)
Storage systems for ML workloads (distributed filesystems, object storage)
Infrastructure as Code and GitOps workflows
What We're Looking For We're looking for an AI infrastructure engineer who understands the full stack from silicon to model serving — and can build systems that make AI deployment effortless.
You should have:
Deep understanding of what it takes to run AI workloads at scale
Experience with the operational challenges of GPU clusters and ML infrastructure
Ability to debug performance issues across hardware, networking, and software
Comfort working across infrastructure, ML frameworks, and developer experience
Excitement about building the foundational layer for physical AI systems
Requirements:
Bachelor's or Master's in Computer Science, Computer Engineering, or equivalent experience
3+ years of experience in ML infrastructure, MLOps, or AI platform engineering
Willingness to work startup hours, in-person (weekends included) at our San Francisco office
Work authorization in the United States
Why Join We're building the intelligence layer for hardware — real-time systems that control physical machines with zero tolerance for latency or failure.
What we offer:
Startup-level equity and highly competitive salary
Ownership over AI infrastructure that powers next-generation systems
Problems at the intersection of hardware intelligence and machine learning
Close collaboration with customers pushing the boundaries of AI deployment
How to Apply Email:
team@cosmiclabs.io Subject line:
AI Infrastructure / [Your Name]
Include in your email:
Your name
Why this role and why Cosmic Labs
What you bring technically
Soonest available start date
GitHub or GitLab link
Confirmation of work authorization in the U.S.
Confirmation of willingness to work full-time, in-person in San Francisco
Attach:
PDF resume
#J-18808-Ljbffr
San Francisco (Onsite) Type:
Full-time Start Date:
ASAP
What You'll Do Design and build infrastructure for deploying, scaling, and managing AI/ML workloads
Develop automation for GPU cluster provisioning, configuration, and orchestration
Build systems for hardware-aware model deployment and inference optimization
Create tooling for AI infrastructure observability, debugging, and performance tuning
Work on integration between hardware intelligence and ML frameworks
Collaborate with customers deploying large-scale AI systems in production
Optimize resource utilization across heterogeneous compute (GPUs, TPUs, custom accelerators)
What You Bring Strong experience with:
GPU cluster management and orchestration (SLURM, Kubernetes, Ray)
ML infrastructure and frameworks (PyTorch, TensorFlow, JAX, NVIDIA stack)
Distributed training and inference systems
Container orchestration for ML workloads (Docker, Kubernetes, KubeFlow)
Linux systems programming and performance optimization
Python and systems scripting
Familiarity with:
Hardware architectures for AI (NVIDIA GPUs, AMD GPUs, custom accelerators)
High-performance networking for distributed ML (NCCL, InfiniBand, RoCE)
Model serving infrastructure (Triton, vLLM, TensorRT)
Storage systems for ML workloads (distributed filesystems, object storage)
Infrastructure as Code and GitOps workflows
What We're Looking For We're looking for an AI infrastructure engineer who understands the full stack from silicon to model serving — and can build systems that make AI deployment effortless.
You should have:
Deep understanding of what it takes to run AI workloads at scale
Experience with the operational challenges of GPU clusters and ML infrastructure
Ability to debug performance issues across hardware, networking, and software
Comfort working across infrastructure, ML frameworks, and developer experience
Excitement about building the foundational layer for physical AI systems
Requirements:
Bachelor's or Master's in Computer Science, Computer Engineering, or equivalent experience
3+ years of experience in ML infrastructure, MLOps, or AI platform engineering
Willingness to work startup hours, in-person (weekends included) at our San Francisco office
Work authorization in the United States
Why Join We're building the intelligence layer for hardware — real-time systems that control physical machines with zero tolerance for latency or failure.
What we offer:
Startup-level equity and highly competitive salary
Ownership over AI infrastructure that powers next-generation systems
Problems at the intersection of hardware intelligence and machine learning
Close collaboration with customers pushing the boundaries of AI deployment
How to Apply Email:
team@cosmiclabs.io Subject line:
AI Infrastructure / [Your Name]
Include in your email:
Your name
Why this role and why Cosmic Labs
What you bring technically
Soonest available start date
GitHub or GitLab link
Confirmation of work authorization in the U.S.
Confirmation of willingness to work full-time, in-person in San Francisco
Attach:
PDF resume
#J-18808-Ljbffr