Voltai
Research Engineer - CUDA Kernel Engineering
Voltai, Palo Alto, California, United States, 94306
About Voltai
Voltai is developing world models, and embodied agents to learn, evaluate, plan, experiment, and interact with the physical world. We are starting out with understanding and building hardware; electronics systems and semiconductors where AI can design and create beyond human cognitive limits.
About the Team Backed by Silicon Valley’s top investors, Stanford University, and CEOs/Presidents of Google, AMD, Broadcom, Marvell, etc. We are a team of previous Stanford professors, SAIL researchers, Olympiad medalists (IPhO, IOI, etc.), CTOs of Synopsys & GlobalFoundries, Head of Sales & CRO of Cadence, former US Secretary of Defense, National Security Advisor, and Senior Foreign‑Policy Advisor to four US presidents.
About the role You will develop, integrate, and optimize
state‑of‑the‑art CUDA kernels
to power AI models that accelerate semiconductor design and verification. Your work will enable large‑scale model training, inference, and reinforcement learning systems that reason about circuit layouts, generate and validate RTL, and optimize chip architectures — running efficiently across thousands of GPUs. You’ll build tools, performance benchmarks, and integration layers that push the limits of GPU utilization for compute‑intensive workloads in AI‑driven hardware design. Working closely with researchers and engineers, you’ll help make Voltai the world’s leading AI + semiconductor research organization. You’ll also release your kernels and tooling as contributions to the
open‑source AI and HPC ecosystems .
Responsibilities
Writing and optimizing CUDA kernels for large‑scale AI workloads (attention, routing, graph‑based operations, physics‑inspired operators, etc.)
Profiling and optimizing GPU performance for custom compute or memory‑bound workloads
Integrating custom kernels into cutting‑edge training and inference frameworks (e.g., PyTorch, Megatron, vLLM, TorchTitan)
Working with the latest NVIDIA hardware and software stacks (Hopper, Blackwell, NVLink, NCCL, Triton)
Building GPU‑accelerated primitives for graph reasoning, symbolic computation, or hardware simulation tasks
Collaborating with AI researchers and semiconductor experts to translate domain‑specific workloads into high‑performance GPU code
Qualifications
Experience with CUDA kernel development and optimization for AI workloads
Strong understanding of GPU architecture and performance analysis
Familiarity with deep learning frameworks and GPU ecosystems
Knowledge of NVIDIA hardware stacks and software libraries
Ability to translate complex algorithms into efficient GPU code
Experience in collaborating with cross‑disciplinary teams (researchers, engineers)
Seniority Level Entry level
Employment Type Full‑time
Job Function Engineering and Information Technology
Industries Technology, Information and Internet
Location: Palo Alto, CA
Salary: $160,000.00 – $180,000.00
#J-18808-Ljbffr
About the Team Backed by Silicon Valley’s top investors, Stanford University, and CEOs/Presidents of Google, AMD, Broadcom, Marvell, etc. We are a team of previous Stanford professors, SAIL researchers, Olympiad medalists (IPhO, IOI, etc.), CTOs of Synopsys & GlobalFoundries, Head of Sales & CRO of Cadence, former US Secretary of Defense, National Security Advisor, and Senior Foreign‑Policy Advisor to four US presidents.
About the role You will develop, integrate, and optimize
state‑of‑the‑art CUDA kernels
to power AI models that accelerate semiconductor design and verification. Your work will enable large‑scale model training, inference, and reinforcement learning systems that reason about circuit layouts, generate and validate RTL, and optimize chip architectures — running efficiently across thousands of GPUs. You’ll build tools, performance benchmarks, and integration layers that push the limits of GPU utilization for compute‑intensive workloads in AI‑driven hardware design. Working closely with researchers and engineers, you’ll help make Voltai the world’s leading AI + semiconductor research organization. You’ll also release your kernels and tooling as contributions to the
open‑source AI and HPC ecosystems .
Responsibilities
Writing and optimizing CUDA kernels for large‑scale AI workloads (attention, routing, graph‑based operations, physics‑inspired operators, etc.)
Profiling and optimizing GPU performance for custom compute or memory‑bound workloads
Integrating custom kernels into cutting‑edge training and inference frameworks (e.g., PyTorch, Megatron, vLLM, TorchTitan)
Working with the latest NVIDIA hardware and software stacks (Hopper, Blackwell, NVLink, NCCL, Triton)
Building GPU‑accelerated primitives for graph reasoning, symbolic computation, or hardware simulation tasks
Collaborating with AI researchers and semiconductor experts to translate domain‑specific workloads into high‑performance GPU code
Qualifications
Experience with CUDA kernel development and optimization for AI workloads
Strong understanding of GPU architecture and performance analysis
Familiarity with deep learning frameworks and GPU ecosystems
Knowledge of NVIDIA hardware stacks and software libraries
Ability to translate complex algorithms into efficient GPU code
Experience in collaborating with cross‑disciplinary teams (researchers, engineers)
Seniority Level Entry level
Employment Type Full‑time
Job Function Engineering and Information Technology
Industries Technology, Information and Internet
Location: Palo Alto, CA
Salary: $160,000.00 – $180,000.00
#J-18808-Ljbffr