XPENG & Volkswagen Group

GPGPU Software Architect/ Principal Engineer

XPENG & Volkswagen Group, Santa Clara, California, us, 95053

GPGPU Software Architect/ Principal Engineer

XPENG

is a leading smart technology company at the forefront of innovation, integrating advanced AI and autonomous driving technologies into its vehicles, including electric vehicles (EVs), electric vertical take-off and landing (eVTOL) aircraft, and robotics. With a strong focus on intelligent mobility, XPENG is dedicated to reshaping the future of transportation through cutting-edge R&D in AI, machine learning, and smart connectivity. We are transitioning our software stack to a General Purpose GPU (GPGPU) architecture and embracing the CUDA ecosystem. Our goal is to achieve over 90% compatibility with cuBLAS/cuDNN on Linux across PCIe and CXL connections, while delivering at least 1.3x the performance of existing solutions on Transformer and Stable-Diffusion workloads. Job Responsibilities: Software Technical Strategy

Develop and refine a comprehensive 3-year roadmap for a software stack compatible with CUDA, encompassing Runtime, Driver, Compiler, Profiler, Debugger, and AI acceleration libraries Define binding specifications that link our upcoming GPU ISA to CUDA APIs, ensuring forward compatibility with CUDA 12.x features Evaluate and integrate the latest technological advancements: CUDA Graph, Transformer Engine, virtual memory management, CUDA dynamic CUTLASS 3.x, TMA, Blackwell FP4, among others Define the task launch protocol, including Queue, Stream, Event, and Graph, as well as the memory model Design a dual-mode (JIT & offline) compiler supporting LTO, PGO, Auto-Tuning, and efficient PTX→ISA microcode caching Develop GPU virtualization schemes (MIG) that work across processes and containers Performance & Observability

Build an observability platform: Nsys-compatible traces, real-time Metric-QPS dashboards, and an AI Advisor for identifying bottlenecks automatically Manage internal AI benchmarks as the single source of truth, including MLPerf Inference, Stable Diffusion XL, and 70B LLM Cross-functional Collaboration

Co-design ISA compatible with CUDA Compute Capability 12.x with the hardware architecture team Collaborate with AI framework teams (PyTorch, TensorFlow, JAX, ONNX Runtime) to build fully reusable kernel libraries Partner with Cloud and Kubernetes teams to co-develop Device Plugins, GPU Operators, and RDMA Network Policies 10+ years in systems software, with at least 5 years in designing CUDA Compute stacks Led end-to-end development of a GPU Runtime or AI acceleration library generation Comprehensive mastery of PTX/SASS, CUDA Driver API, and cuBLAS/cuDNN internals; experience with LLVM NVPTX backend Profound understanding of GPU micro-architecture, including SM architecture, Warp Scheduler, Shared-Memory conflicts, and Tensor Core pipelines Proficiency with PCIe/CXL/RDMA topologies, NUMA settings, and GPU Direct RDMA/Storage Salary range: The base salary for this full-time position is $241,800 - $409,200, in addition to bonus, equity, and benefits. Salary ranges are determined by role, level, and location and reflect minimum and maximum targets for new hire salaries across US locations. Within the range, pay is influenced by location and factors such as skills, experience, and education. We are an Equal Opportunity Employer. It is our policy to provide equal employment opportunities to all qualified persons without regard to race, age, color, sex, sexual orientation, religion, national origin, disability, veteran status or marital status or any other category protected by law.

#J-18808-Ljbffr