NVIDIA
Senior Software Engineer - Parallel Computing Systems
NVIDIA, California, Missouri, United States, 65018
Senior Software Engineer - Parallel Computing Systems
NVIDIA
is seeking a Senior Software Engineer for the Parallel Computing Systems team to advance AI compilation and high-performance GPU workloads. What You'll Be Doing
As an nvFuser engineer you will work on compiler technology and performance optimization. You will design algorithms that generate highly optimized code from deep learning programs and build GPU-aware CPU runtime systems that coordinate kernel execution for maximum performance. You will work with NVIDIA's hardware engineers to master the latest GPU architectures and collaborate with optimization specialists on techniques for emerging AI workloads. You will debug performance bottlenecks in large distributed systems and contribute to future hardware design and compiler infrastructure that advances GPU performance. What We Need To See
MS or PhD in Computer Science, Computer Engineering, Electrical Engineering, or related field (or equivalent experience). 2+ years advanced C++ programming with large codebase development, template meta-programming, and performance-critical code. Strong parallel programming experience with multi-threading, OpenMP, CUDA, MPI, NCCL, NVSHMEM, or other parallel computing technologies. Experience with low-level performance optimization and systematic bottleneck identification beyond basic profiling. Performance analysis skills: experience analyzing high-level programs to identify performance bottlenecks and develop optimization strategies. Collaborative problem-solving with adaptability, first-principles thinking, and a sense of ownership. Excellent verbal and written communication skills. Ways to Stand Out
Experience with HPC/Scientific Computing: CUDA optimization, GPU programming, numerical libraries (cuBLAS, NCCL), or distributed computing. Compiler engineering background: LLVM, GCC, domain-specific language design, program analysis, or IR transformations and optimization passes. Deep technical foundation in CPU/GPU architectures, numeric libraries, modular software design, or runtime systems. Experience with large software projects, performance profiling, and rapid learning. Expertise with distributed parallelism techniques, tensor operations, auto-tuning, or performance modeling. Compensation and Benefits
Base salary will be determined by location, experience, and pay of similar roles. Base salary range: 148,000 USD 235,750 USD. Equity and benefits are also available. Equal Opportunity
NVIDIA is an equal opportunity employer and values diversity in its workforce. We do not discriminate on race, religion, color, national origin, gender, gender identity, sexual orientation, age, marital status, veteran status, disability status, or any characteristic protected by law. Applications for this job will be accepted at least until August 29, 2025. #J-18808-Ljbffr
NVIDIA
is seeking a Senior Software Engineer for the Parallel Computing Systems team to advance AI compilation and high-performance GPU workloads. What You'll Be Doing
As an nvFuser engineer you will work on compiler technology and performance optimization. You will design algorithms that generate highly optimized code from deep learning programs and build GPU-aware CPU runtime systems that coordinate kernel execution for maximum performance. You will work with NVIDIA's hardware engineers to master the latest GPU architectures and collaborate with optimization specialists on techniques for emerging AI workloads. You will debug performance bottlenecks in large distributed systems and contribute to future hardware design and compiler infrastructure that advances GPU performance. What We Need To See
MS or PhD in Computer Science, Computer Engineering, Electrical Engineering, or related field (or equivalent experience). 2+ years advanced C++ programming with large codebase development, template meta-programming, and performance-critical code. Strong parallel programming experience with multi-threading, OpenMP, CUDA, MPI, NCCL, NVSHMEM, or other parallel computing technologies. Experience with low-level performance optimization and systematic bottleneck identification beyond basic profiling. Performance analysis skills: experience analyzing high-level programs to identify performance bottlenecks and develop optimization strategies. Collaborative problem-solving with adaptability, first-principles thinking, and a sense of ownership. Excellent verbal and written communication skills. Ways to Stand Out
Experience with HPC/Scientific Computing: CUDA optimization, GPU programming, numerical libraries (cuBLAS, NCCL), or distributed computing. Compiler engineering background: LLVM, GCC, domain-specific language design, program analysis, or IR transformations and optimization passes. Deep technical foundation in CPU/GPU architectures, numeric libraries, modular software design, or runtime systems. Experience with large software projects, performance profiling, and rapid learning. Expertise with distributed parallelism techniques, tensor operations, auto-tuning, or performance modeling. Compensation and Benefits
Base salary will be determined by location, experience, and pay of similar roles. Base salary range: 148,000 USD 235,750 USD. Equity and benefits are also available. Equal Opportunity
NVIDIA is an equal opportunity employer and values diversity in its workforce. We do not discriminate on race, religion, color, national origin, gender, gender identity, sexual orientation, age, marital status, veteran status, disability status, or any characteristic protected by law. Applications for this job will be accepted at least until August 29, 2025. #J-18808-Ljbffr