asobbi
This range is provided by asobbi. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.
Base pay range $220,000.00/yr - $300,000.00/yr
Solutions Architect – HPC/AI/ML | Remote | Exciting AI Infrastructure Scale-Up We're working with an innovative client who's at the forefront of AI/ML infrastructure, providing cutting‑edge solutions that power large‑scale distributed training and inference workloads. They're looking for an exceptional
Solutions Architect
to join their growing team and work directly with customers pushing the boundaries of what's possible with AI.
The Role This is a truly exciting opportunity to bridge the gap between bleeding‑edge technology and real‑world enterprise applications. You'll be the technical expert who architects and deploys sophisticated Kubernetes environments and high‑performance networking solutions specifically designed for AI/ML and HPC workloads.
Designing and implementing Kubernetes environments with high‑performance networking for demanding AI/ML workloads
Supporting customers with Slurm‑based workload management to optimize their large‑scale distributed training and inference
Creating proof‑of‑concept projects and benchmarking performance to demonstrate value
Acting as a trusted technical advisor, understanding customer business needs and developing tailored, scalable solutions
Providing deep expertise on GPU acceleration, distributed computing, and AI frameworks
Collaborating with product and engineering teams, using customer insights to shape the product roadmap
What You'll Bring Essential Technical Skills
Bachelor's degree in Computer Science, Electrical Engineering, Data Science, or related field
7+ years' experience as a Solutions Architect, Technical Account Manager, or Cloud Engineer in AI, HPC, or cloud computing
Deep expertise in cloud computing concepts and architecture , with practical experience designing scalable infrastructure
Strong knowledge of high‑performance networking , particularly InfiniBand fabric architecture and configuration
Hands‑on experience with Kubernetes
for orchestrating containerized workloads at scale, including custom resource definitions and operators
Proven experience with Slurm workload manager
for scheduling and managing large‑scale distributed AI/ML training jobs
Solid understanding of NVIDIA GPU architectures
(A100, H100, etc.) and their optimal configurations for different workload types
Practical knowledge of NVIDIA NCCL
for multi‑GPU and multi‑node communication optimization
Demonstrated ability to design and implement complex, production‑grade infrastructure solutions from the ground up
Experience troubleshooting performance bottlenecks in distributed AI/ML systems
Highly Desirable
Master's or PhD in AI, Machine Learning, High‑Performance Computing, or Cloud Computing
Experience with bare metal infrastructure provisioning and configuration
for AI workloads
Knowledge of containerized AI workflow platforms
such as Kubeflow for MLOps pipelines and MLflow for experiment tracking
Familiarity with high‑performance storage architectures
including Lustre parallel file systems and GPUDirect Storage for eliminating CPU bottlenecks
Understanding of popular AI/ML frameworks (PyTorch, TensorFlow, JAX) and their distributed training capabilities
Experience with network performance tuning and RDMA protocols
Knowledge of container runtimes optimized for GPU workloads
What's On Offer
Generous equity scheme (2x base salary)
Company bonus
Comprehensive medical, dental, and vision insurance for you and your family
401(k) with generous employer match
Company‑paid life insurance
Flexible Spending Account
Mental wellness benefits
Flexible PTO
A dynamic, innovative work culture focused on disruption
Interested? If you're passionate about AI infrastructure and want to work with customers doing genuinely ground‑breaking work, I'd love to hear from you. Please get in touch to discuss this opportunity further.
Seniority level Mid‑Senior level
Employment type Full‑time
Job function Information Technology
Industries Staffing and Recruiting & IT Services and IT Consulting
Location: Remote (US and Europe)
City: New York, NY (Remote)
#J-18808-Ljbffr
Base pay range $220,000.00/yr - $300,000.00/yr
Solutions Architect – HPC/AI/ML | Remote | Exciting AI Infrastructure Scale-Up We're working with an innovative client who's at the forefront of AI/ML infrastructure, providing cutting‑edge solutions that power large‑scale distributed training and inference workloads. They're looking for an exceptional
Solutions Architect
to join their growing team and work directly with customers pushing the boundaries of what's possible with AI.
The Role This is a truly exciting opportunity to bridge the gap between bleeding‑edge technology and real‑world enterprise applications. You'll be the technical expert who architects and deploys sophisticated Kubernetes environments and high‑performance networking solutions specifically designed for AI/ML and HPC workloads.
Designing and implementing Kubernetes environments with high‑performance networking for demanding AI/ML workloads
Supporting customers with Slurm‑based workload management to optimize their large‑scale distributed training and inference
Creating proof‑of‑concept projects and benchmarking performance to demonstrate value
Acting as a trusted technical advisor, understanding customer business needs and developing tailored, scalable solutions
Providing deep expertise on GPU acceleration, distributed computing, and AI frameworks
Collaborating with product and engineering teams, using customer insights to shape the product roadmap
What You'll Bring Essential Technical Skills
Bachelor's degree in Computer Science, Electrical Engineering, Data Science, or related field
7+ years' experience as a Solutions Architect, Technical Account Manager, or Cloud Engineer in AI, HPC, or cloud computing
Deep expertise in cloud computing concepts and architecture , with practical experience designing scalable infrastructure
Strong knowledge of high‑performance networking , particularly InfiniBand fabric architecture and configuration
Hands‑on experience with Kubernetes
for orchestrating containerized workloads at scale, including custom resource definitions and operators
Proven experience with Slurm workload manager
for scheduling and managing large‑scale distributed AI/ML training jobs
Solid understanding of NVIDIA GPU architectures
(A100, H100, etc.) and their optimal configurations for different workload types
Practical knowledge of NVIDIA NCCL
for multi‑GPU and multi‑node communication optimization
Demonstrated ability to design and implement complex, production‑grade infrastructure solutions from the ground up
Experience troubleshooting performance bottlenecks in distributed AI/ML systems
Highly Desirable
Master's or PhD in AI, Machine Learning, High‑Performance Computing, or Cloud Computing
Experience with bare metal infrastructure provisioning and configuration
for AI workloads
Knowledge of containerized AI workflow platforms
such as Kubeflow for MLOps pipelines and MLflow for experiment tracking
Familiarity with high‑performance storage architectures
including Lustre parallel file systems and GPUDirect Storage for eliminating CPU bottlenecks
Understanding of popular AI/ML frameworks (PyTorch, TensorFlow, JAX) and their distributed training capabilities
Experience with network performance tuning and RDMA protocols
Knowledge of container runtimes optimized for GPU workloads
What's On Offer
Generous equity scheme (2x base salary)
Company bonus
Comprehensive medical, dental, and vision insurance for you and your family
401(k) with generous employer match
Company‑paid life insurance
Flexible Spending Account
Mental wellness benefits
Flexible PTO
A dynamic, innovative work culture focused on disruption
Interested? If you're passionate about AI infrastructure and want to work with customers doing genuinely ground‑breaking work, I'd love to hear from you. Please get in touch to discuss this opportunity further.
Seniority level Mid‑Senior level
Employment type Full‑time
Job function Information Technology
Industries Staffing and Recruiting & IT Services and IT Consulting
Location: Remote (US and Europe)
City: New York, NY (Remote)
#J-18808-Ljbffr