Logo
xAI

RDMA Engineer - Supercomputing

xAI, Palo Alto

Save Job

Join to apply for the RDMA Engineer - Supercomputing role at xAI

Join to apply for the RDMA Engineer - Supercomputing role at xAI

Get AI-powered advice on this job and more exclusive features.

xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge.

Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity.

We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company’s mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important.

All engineers and researchers are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.

About the Role

About xAI

xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge.


Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity.


We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company’s mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important.


All engineers and researchers are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.

About the Role

RDMA Engineers on xAI’s Supercomputing team design and optimize low-latency, high-bandwidth networking solutions using NVIDIA’s RDMA-capable technologies to support some of the world’s largest GPU supercomputing clusters. These clusters drive AI training and inference workloads, demanding cutting-edge performance and scalability.


Focus

  • Develop and tune RDMA-based communication systems leveraging NVIDIA GPUs and Mellanox NICs (InfiniBand, RoCE) for ultra-fast data transfer between nodes.

  • Implement and optimize GPUDirect RDMA to enable direct memory access between GPUs and network interfaces, minimizing CPU overhead.

  • Integrate RDMA solutions with Kubernetes-based workloads, ensuring seamless operation across distributed compute and storage systems.

  • Collaborate with AI researchers and infrastructure teams to accelerate data pipelines and collective communications using NCCL and MPI.

  • Troubleshoot and resolve performance bottlenecks in high-throughput, low-latency networking environments.


Ideal Experience

  • Hands-on experience with NVIDIA RDMA technologies (e.g., GPUDirect RDMA, RoCE, InfiniBand) in HPC or AI supercomputing environments.

  • Proficiency in programming with Rust, C, or C++ for low-level networking and system optimization.

  • Familiarity with NVIDIA’s networking stack, including Mellanox drivers, libraries (e.g., libibverbs), and tools (e.g., NVPeerMemory).

  • Experience optimizing distributed systems with MPI, NCCL, or similar frameworks for GPU-accelerated workloads.

  • Knowledge of Kubernetes networking and integrating RDMA into containerized environments.

  • Bonus: Background in AI/ML training workflows and their networking demands (e.g., large-scale parameter synchronization).


Tech Stack

  • NVIDIA GPUs and Mellanox networking (InfiniBand, RoCE)

  • RDMA protocols (e.g., GPUDirect RDMA, RoCEv2)

  • Kubernetes

  • Rust and C/C++

  • MPI (Message Passing Interface) and NCCL (NVIDIA Collective Communications Library)


Annual Salary Range

$180,000 - $440,000 USD

xAI is an equal opportunity employer and does not unlawfully discriminate based on race, color, religion, ethnicity, ancestry, national origin, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender, gender identity, gender expression, age, disability, medical conditions, genetic information, marital status, military or veteran status, or any other applicable legally protected characteristics.


Qualified applicants with arrest or conviction records will be considered for employment in accordance with all applicable federal, state, and local laws, including the San Francisco Fair Chance Ordinance, Los Angeles County Fair Chance Ordinance for Employers, and the California Fair Chance Act.


For Los Angeles County (unincorporated) Candidates:


xAI reasonably believes that criminal history may have a direct, adverse and negative relationship on the following job duties, potentially resulting in the withdrawal of a conditional offer of employment:



  • Access to information technology systems and confidential information, including proprietary and trade secret information, and/or user data;

  • Interacting with internal and/or external clients and colleagues; and

  • Exercising sound judgment.


California Consumer Privacy Act (CCPA) Notice

Seniority level

  • Seniority level

    Entry level

Employment type

  • Employment type

    Full-time

Job function

  • Job function

    Engineering and Information Technology
  • Industries

    Technology, Information and Internet

Referrals increase your chances of interviewing at xAI by 2x

Get notified about new Software Engineer jobs in Palo Alto, CA .

Software Engineer, AI Intern (Fall 2025)

San Francisco Bay Area $57.00-$61.00 2 weeks ago

Mountain View, CA $125,400.00-$188,100.00 2 weeks ago

Software Engineer, AI Platform - New Grad

San Jose, CA $130,000.00-$180,000.00 2 weeks ago

Software Engineer (L4), Content & Business Products

New Grads 2025 - Software Engineer, Algorithm

San Jose, CA $120,000.00-$165,000.00 9 months ago

New Grads 2025 - General Software Engineer

San Jose, CA $120,000.00-$165,000.00 5 months ago

Mountain View, CA $130,000.00-$176,000.00 1 week ago

Alameda, CA $130,000.00-$160,000.00 3 weeks ago

Software Engineer 4 - TV & Web Player Platform

Software Engineer(s) - New Grad (Fall 2025 Graduation)

Full Stack Software Engineer - Post-training

San Jose, CA $113,400.00-$206,300.00 2 weeks ago

Full Stack Software Engineer (L4), Product Localization Engineering

(General Hire) Software Engineer Graduate (Advertisement Team) - 2025 Start (BS/MS)

San Jose, CA $113,500.00-$250,000.00 2 weeks ago

Sunnyvale, CA $117,000.00-$234,000.00 2 weeks ago

San Jose, CA $133,900.00-$242,000.00 5 days ago

Software Engineer(s) - New Grad (Fall 2025 Graduation)

Palo Alto, CA $152,400.00-$228,700.00 2 weeks ago

New College Grad Software Engineer, Software Engineering Development (Apps)

San Jose, CA $92,735.00-$131,300.00 1 week ago

Frontend Software Engineer - University Graduate 2025

San Mateo, CA $120,000.00-$280,000.00 2 weeks ago

San Jose, CA $100,500.00-$173,250.00 2 weeks ago

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

#J-18808-Ljbffr