asobbi

AI / ML Solutions Architect

asobbi, New York, New York, us, 10261

This range is provided by asobbi. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.

Base pay range $220,000.00/yr - $300,000.00/yr

Solutions Architect – HPC/AI/ML | Remote | Exciting AI Infrastructure Scale-Up We're working with an innovative client who's at the forefront of AI/ML infrastructure, providing cutting‑edge solutions that power large‑scale distributed training and inference workloads. They're looking for an exceptional

Solutions Architect

to join their growing team and work directly with customers pushing the boundaries of what's possible with AI.

The Role This is a truly exciting opportunity to bridge the gap between bleeding‑edge technology and real‑world enterprise applications. You'll be the technical expert who architects and deploys sophisticated Kubernetes environments and high‑performance networking solutions specifically designed for AI/ML and HPC workloads.

Designing and implementing Kubernetes environments with high‑performance networking for demanding AI/ML workloads

Supporting customers with Slurm‑based workload management to optimize their large‑scale distributed training and inference

Creating proof‑of‑concept projects and benchmarking performance to demonstrate value

Acting as a trusted technical advisor, understanding customer business needs and developing tailored, scalable solutions

Providing deep expertise on GPU acceleration, distributed computing, and AI frameworks

Collaborating with product and engineering teams, using customer insights to shape the product roadmap

What You'll Bring Essential Technical Skills

Bachelor's degree in Computer Science, Electrical Engineering, Data Science, or related field

7+ years' experience as a Solutions Architect, Technical Account Manager, or Cloud Engineer in AI, HPC, or cloud computing

Deep expertise in cloud computing concepts and architecture , with practical experience designing scalable infrastructure

Strong knowledge of high‑performance networking , particularly InfiniBand fabric architecture and configuration

Hands‑on experience with Kubernetes

for orchestrating containerized workloads at scale, including custom resource definitions and operators

Proven experience with Slurm workload manager

for scheduling and managing large‑scale distributed AI/ML training jobs

Solid understanding of NVIDIA GPU architectures

(A100, H100, etc.) and their optimal configurations for different workload types

Practical knowledge of NVIDIA NCCL

for multi‑GPU and multi‑node communication optimization

Demonstrated ability to design and implement complex, production‑grade infrastructure solutions from the ground up

Experience troubleshooting performance bottlenecks in distributed AI/ML systems

Highly Desirable

Master's or PhD in AI, Machine Learning, High‑Performance Computing, or Cloud Computing

Experience with bare metal infrastructure provisioning and configuration

for AI workloads

Knowledge of containerized AI workflow platforms

such as Kubeflow for MLOps pipelines and MLflow for experiment tracking

Familiarity with high‑performance storage architectures

including Lustre parallel file systems and GPUDirect Storage for eliminating CPU bottlenecks

Understanding of popular AI/ML frameworks (PyTorch, TensorFlow, JAX) and their distributed training capabilities

Experience with network performance tuning and RDMA protocols

Knowledge of container runtimes optimized for GPU workloads

What's On Offer

Generous equity scheme (2x base salary)

Company bonus

Comprehensive medical, dental, and vision insurance for you and your family

401(k) with generous employer match

Company‑paid life insurance

Flexible Spending Account

Mental wellness benefits

Flexible PTO

A dynamic, innovative work culture focused on disruption

Interested? If you're passionate about AI infrastructure and want to work with customers doing genuinely ground‑breaking work, I'd love to hear from you. Please get in touch to discuss this opportunity further.

Seniority level Mid‑Senior level

Employment type Full‑time

Job function Information Technology

Industries Staffing and Recruiting & IT Services and IT Consulting

Location: Remote (US and Europe)

City: New York, NY (Remote)

#J-18808-Ljbffr