Big Cloud
Senior Consultant | AI / Robotics and Autonomous Systems
Read all the information about this opportunity carefully, then use the application button below to send your CV and application. I'm looking for a
hands-on ML Infrastructure Engineer
to help scale and optimize large-scale training systems for robotics and AI. This is a high-impact role working close to the GPUs, driving inference, ML Ops, and distributed training at scale.
What you’ll do:
Build and maintain infrastructure for large-scale training (scheduling, orchestration, checkpointing, metrics).
Scale JAX-based pipelines across GPU/TPU clusters for high-throughput experiments.
Optimize performance across data pipelines, model loops, and distributed sync.
Partner with researchers to turn ideas into production-ready training runs.
What we’re looking for:
Strong software engineering skills in ML infrastructure/platforms.
Hands-on experience with JAX (preferred), PyTorch, or TensorFlow.
Proven expertise in distributed training and performance optimization.
Strong communicator who thrives collaborating with researchers and engineers.
A scrappy, ownership-driven builder who loves scaling systems fast.
This is a rare chance to work at the intersection of
foundation models and robotics , helping shape the future of physical AI.
Seniority level Mid-Senior level
Employment type Full-time
Job function Staffing and Recruiting
#J-18808-Ljbffr
Read all the information about this opportunity carefully, then use the application button below to send your CV and application. I'm looking for a
hands-on ML Infrastructure Engineer
to help scale and optimize large-scale training systems for robotics and AI. This is a high-impact role working close to the GPUs, driving inference, ML Ops, and distributed training at scale.
What you’ll do:
Build and maintain infrastructure for large-scale training (scheduling, orchestration, checkpointing, metrics).
Scale JAX-based pipelines across GPU/TPU clusters for high-throughput experiments.
Optimize performance across data pipelines, model loops, and distributed sync.
Partner with researchers to turn ideas into production-ready training runs.
What we’re looking for:
Strong software engineering skills in ML infrastructure/platforms.
Hands-on experience with JAX (preferred), PyTorch, or TensorFlow.
Proven expertise in distributed training and performance optimization.
Strong communicator who thrives collaborating with researchers and engineers.
A scrappy, ownership-driven builder who loves scaling systems fast.
This is a rare chance to work at the intersection of
foundation models and robotics , helping shape the future of physical AI.
Seniority level Mid-Senior level
Employment type Full-time
Job function Staffing and Recruiting
#J-18808-Ljbffr