XPENG Deutschland
GPGPU Software Architect/ Principal Engineer
XPENG Deutschland, San Diego, California, United States, 92189
GPGPU Software Architect/ Principal Engineer
Join to apply for the
GPGPU Software Architect/ Principal Engineer
role at
XPENG Deutschland GPGPU Software Architect/ Principal Engineer
3 weeks ago Be among the first 25 applicants Join to apply for the
GPGPU Software Architect/ Principal Engineer
role at
XPENG Deutschland XPENG
is a leading smart technology company at the forefront of innovation, integrating advanced AI and autonomous driving technologies into its vehicles, including electric vehicles (EVs), electric vertical take-off and landing (eVTOL) aircraft, and robotics. With a strong focus on intelligent mobility, XPENG is dedicated to reshaping the future of transportation through cutting-edge R&D in AI, machine learning, and smart connectivity.
Our pioneering first-generation NPU, utilizing DSA architecture, has successfully entered mass production. We're currently validating the architecture of our second generation and are making the strategic decision to transition towards General Purpose GPU (GPGPU) architecture.
We're completely overhauling our software stack and embracing the CUDA ecosystem. Our goal is to achieve over 90% compatibility with cuBLAS/cuDNN on Linux across PCIe and CXL connections, all while delivering at least 1.3 times the performance of existing solutions on Transformer and Stable-Diffusion workloads.
Job Responsibilities
Software Technical Strategy
Develop and refine a comprehensive 3-year roadmap for a software stack compatible with CUDA, encompassing Runtime, Driver, Compiler, Profiler, Debugger, and AI acceleration libraries Define binding specifications that link our upcoming GPU ISA to CUDA APIs, ensuring forward compatibility with CUDA 12.x features Evaluate and integrate the latest technological advancements: CUDA Graph, Transformer Engine, virtual memory management, CUDA dynamic CUTLASS 3.x, TMA, Blackwell FP4, among others
Architecture & Design
Create a modular, layered Runtime architecture: CUDA ? HAL ? Kernel ? Hardware, applicable across emulators, and actual silicon Define the task launch protocol, including Queue, Stream, Event, and Graph, as well as the memory model Design a dual-mode (JIT & offline) compiler supporting LTO, PGO, Auto-Tuning, and efficient PTX?ISA microcode caching Develop GPU virtualization schemes(MIG) that work across processes and containers
Performance & Observability
Implement an end-to-end performance model: Python API ? CUDA Runtime ? Driver ? ISA ? Micro-architecture ? Board-level interconnect Build an observability platform: Nsys-compatible traces, real-time Metric-QPS dashboards, and an AI Advisor for identifying bottlenecks automatically Manage internal AI benchmarks as the single source of truth. Benchmark includes MLPerf Inference, Stable Diffusion XL, and 70B LLM
Cross-functional Collaboration
Co-design ISA which compatible with CUDA Compute Capability 12.x with our hardware architecture team Collaborate with AI framework teams (PyTorch, TensorFlow, JAX, ONNX Runtime) to build fully reusable kernel libraries Partner with Cloud and K8s teams to co-develop Device Plugins, GPU Operators, and RDMA Network Policies
Minimum Requirements
10 years + in systems software, with at least 5 years in designing CUDA Compute stacks Led end-to-end development of a GPU Runtime or AI acceleration library generation Comprehensive mastery of PTX/SASS, CUDA Driver API, and cuBLAS/cuDNN internals; experience with LLVM NVPTX backend Profound understanding of GPU micro-architecture, including SM architecture, Warp Scheduler, Shared-Memory conflicts, and Tensor Core pipelines Proficiency with PCIe/CXL/RDMA topologies, NUMA settings, and GPU Direct RDMA/Storage
The base salary range for this full-time position is $241,800 - $409,200 in addition to bonus, equity and benefits. Our salary ranges are determined by role, level, and location. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position across all US locations. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training.
We are an Equal Opportunity Employer. It is our policy to provide equal employment opportunities to all qualified persons without regard to race, age, color, sex, sexual orientation, religion, national origin, disability, veteran status or marital status or any other prescribed category set forth in federal or state regulations.
Seniority level
Seniority level
Mid-Senior level Employment type
Employment type
Full-time Job function
Job function
Engineering and Information Technology Industries
Motor Vehicle Manufacturing Referrals increase your chances of interviewing at XPENG Deutschland by 2x Get notified about new Software Architect jobs in
San Diego, CA . Lakeside, CA $110,000.00-$140,000.00 4 months ago Principal Software Architect- Application & Cloud
Delivery Solutions Architect (Onsite in San Diego, CA)
Software Architect - Containers / Virtualisation
Senior Principal Engineer Software - Tenant Solutions Architect (San Diego CA) - R10201652
Software Architect - Containers / Virtualisation
Sr. Software Quality Engineer (Cybersecurity)
Sr Quality Engineer, Software - Design Controls
Sr. Manager of Hardware & System Software
San Diego, CA $164,200.00-$273,600.00 19 hours ago Operations Technology Solutions Architect-Specialist Master
AI Engineering Manager/Solutions Architect - SFL Scientific
San Diego, CA $116,600.00-$194,400.00 19 hours ago Were unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI. #J-18808-Ljbffr
Join to apply for the
GPGPU Software Architect/ Principal Engineer
role at
XPENG Deutschland GPGPU Software Architect/ Principal Engineer
3 weeks ago Be among the first 25 applicants Join to apply for the
GPGPU Software Architect/ Principal Engineer
role at
XPENG Deutschland XPENG
is a leading smart technology company at the forefront of innovation, integrating advanced AI and autonomous driving technologies into its vehicles, including electric vehicles (EVs), electric vertical take-off and landing (eVTOL) aircraft, and robotics. With a strong focus on intelligent mobility, XPENG is dedicated to reshaping the future of transportation through cutting-edge R&D in AI, machine learning, and smart connectivity.
Our pioneering first-generation NPU, utilizing DSA architecture, has successfully entered mass production. We're currently validating the architecture of our second generation and are making the strategic decision to transition towards General Purpose GPU (GPGPU) architecture.
We're completely overhauling our software stack and embracing the CUDA ecosystem. Our goal is to achieve over 90% compatibility with cuBLAS/cuDNN on Linux across PCIe and CXL connections, all while delivering at least 1.3 times the performance of existing solutions on Transformer and Stable-Diffusion workloads.
Job Responsibilities
Software Technical Strategy
Develop and refine a comprehensive 3-year roadmap for a software stack compatible with CUDA, encompassing Runtime, Driver, Compiler, Profiler, Debugger, and AI acceleration libraries Define binding specifications that link our upcoming GPU ISA to CUDA APIs, ensuring forward compatibility with CUDA 12.x features Evaluate and integrate the latest technological advancements: CUDA Graph, Transformer Engine, virtual memory management, CUDA dynamic CUTLASS 3.x, TMA, Blackwell FP4, among others
Architecture & Design
Create a modular, layered Runtime architecture: CUDA ? HAL ? Kernel ? Hardware, applicable across emulators, and actual silicon Define the task launch protocol, including Queue, Stream, Event, and Graph, as well as the memory model Design a dual-mode (JIT & offline) compiler supporting LTO, PGO, Auto-Tuning, and efficient PTX?ISA microcode caching Develop GPU virtualization schemes(MIG) that work across processes and containers
Performance & Observability
Implement an end-to-end performance model: Python API ? CUDA Runtime ? Driver ? ISA ? Micro-architecture ? Board-level interconnect Build an observability platform: Nsys-compatible traces, real-time Metric-QPS dashboards, and an AI Advisor for identifying bottlenecks automatically Manage internal AI benchmarks as the single source of truth. Benchmark includes MLPerf Inference, Stable Diffusion XL, and 70B LLM
Cross-functional Collaboration
Co-design ISA which compatible with CUDA Compute Capability 12.x with our hardware architecture team Collaborate with AI framework teams (PyTorch, TensorFlow, JAX, ONNX Runtime) to build fully reusable kernel libraries Partner with Cloud and K8s teams to co-develop Device Plugins, GPU Operators, and RDMA Network Policies
Minimum Requirements
10 years + in systems software, with at least 5 years in designing CUDA Compute stacks Led end-to-end development of a GPU Runtime or AI acceleration library generation Comprehensive mastery of PTX/SASS, CUDA Driver API, and cuBLAS/cuDNN internals; experience with LLVM NVPTX backend Profound understanding of GPU micro-architecture, including SM architecture, Warp Scheduler, Shared-Memory conflicts, and Tensor Core pipelines Proficiency with PCIe/CXL/RDMA topologies, NUMA settings, and GPU Direct RDMA/Storage
The base salary range for this full-time position is $241,800 - $409,200 in addition to bonus, equity and benefits. Our salary ranges are determined by role, level, and location. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position across all US locations. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training.
We are an Equal Opportunity Employer. It is our policy to provide equal employment opportunities to all qualified persons without regard to race, age, color, sex, sexual orientation, religion, national origin, disability, veteran status or marital status or any other prescribed category set forth in federal or state regulations.
Seniority level
Seniority level
Mid-Senior level Employment type
Employment type
Full-time Job function
Job function
Engineering and Information Technology Industries
Motor Vehicle Manufacturing Referrals increase your chances of interviewing at XPENG Deutschland by 2x Get notified about new Software Architect jobs in
San Diego, CA . Lakeside, CA $110,000.00-$140,000.00 4 months ago Principal Software Architect- Application & Cloud
Delivery Solutions Architect (Onsite in San Diego, CA)
Software Architect - Containers / Virtualisation
Senior Principal Engineer Software - Tenant Solutions Architect (San Diego CA) - R10201652
Software Architect - Containers / Virtualisation
Sr. Software Quality Engineer (Cybersecurity)
Sr Quality Engineer, Software - Design Controls
Sr. Manager of Hardware & System Software
San Diego, CA $164,200.00-$273,600.00 19 hours ago Operations Technology Solutions Architect-Specialist Master
AI Engineering Manager/Solutions Architect - SFL Scientific
San Diego, CA $116,600.00-$194,400.00 19 hours ago Were unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI. #J-18808-Ljbffr