Energy Jobline ZR

Principal Engineer - High-Performance AI Infrastructure in San Jose

Energy Jobline ZR, San Jose, California, United States, 95199

Energy Jobline is the largest and fastest growing global Energy Job Board and Energy Hub. We have an audience reach of over 7 million energy professionals, 400,000+ monthly advertised global energy and engineering jobs, and work with the leading energy companies worldwide.

We focus on the Oil & Gas, Renewables, Engineering, Power, and Nuclear markets as well as emerging technologies in EV, Battery, and Fusion. We are committed to ensuring that we offer the most exciting career opportunities from around the world for our jobseekers.

Job DescriptionJob Description

As a

Principal Engineer for HPC and AI Infrastructure , you’ll take a lead role in designing the low-level systems that maximize GPU utilization across large, mission-critical workloads.

Working within our GPU Runtime & Systems team, you’ll focus on

device drivers, kernel-level optimizations, and runtime performance

to ensure GPU clusters deliver the highest throughput, lowest latency, and greatest reliability possible. Your work will directly accelerate workloads across deep learning, high-performance computing, and real-time simulation.

This position sits at the intersection of

systems programming, GPU architecture, and HPC-scale computing —a unique opportunity to shape infrastructure used by developers and enterprises worldwide.

Key Responsibilities

Build and optimize device drivers and runtime components for GPUs and high-speed interconnects.

Collaborate with kernel and platform teams to design efficient memory pathways (pinned memory, peer-to-peer, unified memory).

Improve data transfers across NVLink, InfiniBand, PCIe, and RDMA to reduce latency and boost throughput.

Enhance GPU memory operations with NUMA-aware strategies and hardware-coherent optimizations.

Implement telemetry and observability tools to monitor GPU performance with minimal runtime overhead.

Contribute to internal debugging/profiling tools for GPU workloads.

Mentor engineers on best practices for GPU systems development and participate in peer design/code reviews.

Stay ahead of evolving GPU and interconnect architectures to influence future infrastructure design.

Minimum Qualifications

Bachelor’s degree in a technical field (STEM), with 10+ years in systems programming, including 5+ years in GPU runtime or driver development.

Experience developing kernel-space modules or runtime libraries (CUDA, ROCm, OpenCL).

Deep familiarity with NVIDIA GPUs, CUDA toolchains, and profiling tools (Nsight, CUPTI, etc.).

Proven ability to optimize workloads across NVLink, PCIe, Unified Memory, and NUMA systems.

Hands-on background in RDMA, InfiniBand, GPUDirect, and related communication frameworks (UCX).

Strong C/C++ programming skills with systems-level expertise (memory management, synchronization, cache coherency).

Qualifications

Expertise in HPC workload optimization and GPU compute/memory tradeoffs.

Knowledge of pinned memory, peer-to-peer transfers, zero-copy, and GPU memory lifetimes.

Strong grasp of multithreaded and asynchronous programming patterns.

Familiarity with AI frameworks (PyTorch, TensorFlow) and Python scripting.

Understanding of low-level CUDA/PTX assembly for debugging or performance tuning.

Experience with storage offloads (NVMe, IOAT, DPDK) or DMA-based acceleration.

Proficiency with system profiling/debugging tools (Valgrind, cuda-memcheck, gdb, Nsight Compute/Systems, perf, eBPF).

An advanced degree (PhD) with research in GPU systems, compilers, or HPC is a plus.

If you are interested in applying for this job please press the Apply Button and follow the application process. Energy Jobline wishes you the very best of luck in your next career move.