Logo
NVIDIA

Solutions Architect - Cloud Infrastructure

NVIDIA, California, Missouri, United States, 65018

Save Job

Solutions Architect - Cloud Infrastructure

Base pay range: $120,000.00/yr - $235,750.00/yr We are excited to announce an opening for a Cloud Solution Architect at NVIDIA and are seeking a passionate individual with a strong interest in large-scale GPU infrastructure and AI Factory deployments. If you are enthusiastic about contributing to projects that push the boundaries of cloud-based AI and resilience in large-scale environments, we invite you to read on. NVIDIA is renowned as one of the most sought-after employers in the technology world, offering highly competitive benefits. We are home to some of the most innovative and forward-thinking individuals globally. If you are creative, autonomous, and eager to apply your skills and knowledge in a dynamic environment, we want to hear from you! Overview

NVIDIA is seeking a Cloud Solution Architect to contribute to AI Factory solutions and large-scale GPU infrastructure. This role involves architecting and deploying resilient, telemetry-driven AI compute environments at unprecedented scale, collaborating with engineering teams, and guiding clients on scalable, reliable, and high-performance workloads. What You'll Be Doing

Work as a key member of the cloud solutions team, serving as the go-to technical expert on NVIDIA AI Factory solutions and large-scale GPU infrastructure, helping clients architect and deploy telemetry-driven AI compute environments at scale. Collaborate directly with engineering teams to secure design wins, address challenges, and deploy solutions into production, focusing on robust tooling for observability, failure recovery, and infrastructure-level performance optimization. Act as a trusted advisor to clients, understanding their cloud environment, translating requirements into technical solutions, and guiding on optimizing NVIDIA AI Factories for scalable, reliable, and high-performance workloads. Qualifications

2+ years of experience in large-scale cloud infrastructure engineering, distributed AI/ML systems, or GPU cluster deployment and management. A BS in Computer Science, Electrical Engineering, Mathematics, or Physics, or equivalent experience. Proven understanding of large-scale computing systems architecture, including multi-node GPU clusters, high-performance networking, and distributed storage. Experience with infrastructure-as-code, automation, and configuration management for large-scale deployments. A passion for machine learning and AI, and the drive to continually learn and apply new technologies. Excellent interpersonal skills, including the ability to explain complex technical topics to non-experts. Ways To Stand Out From The Crowd

Expertise with orchestration and workload management tools like Slurm, Kubernetes, Run:ai, or similar platforms for GPU resource scheduling. Knowledge of AI training and inference performance optimization at scale, including distributed training frameworks and multi-node communication patterns. Hands-on experience designing telemetry systems and failure recovery mechanisms for large-scale cloud infrastructures including observability tools such as Grafana, Prometheus, and OpenTelemetry. Proficiency in deploying and managing cloud-native solutions using platforms such as AWS, Azure, or Google Cloud, with a focus on GPU-accelerated workloads. Deep expertise with high-performance networking technologies, particularly NVIDIA InfiniBand, NCCL, and GPU-Direct RDMA for large-scale AI workloads. Compensation and Benefits

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 120,000 USD - 189,750 USD for Level 2, and 148,000 USD - 235,750 USD for Level 3. You will also be eligible for equity and benefits. Job Details

Seniority level: Mid-Senior level Employment type: Full-time Job function: Information Technology Industries: Computer Hardware Manufacturing, Software Development, and Computers and Electronics Manufacturing Additional Information

Applications for this job will be accepted at least until October 11, 2025. NVIDIA is committed to fostering a diverse work environment and is an equal opportunity employer. We value diversity and do not discriminate in hiring or promotion on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any characteristic protected by law. JR2005440

#J-18808-Ljbffr