The Rundown AI, Inc.
Sr. Staff Software Engineer – High Performance GPU Inference Systems
The Rundown AI, Inc., Mission
Sr. Staff Software Engineer - High Performance GPU Inference Systems
Mission:
Push the limits of heterogeneous GPU environments, dynamic global scheduling, and end-to-end system performance—all while running code as close to the metal as possible.
Responsibilities & opportunities in this role:
- Distributed Systems Engineering: Design and implement scalable, low-latency runtime systems that coordinate thousands of GPUs across tightly integrated, software-defined infrastructure.
- Low-Level GPU Optimization: Build deterministic, hardware-aware abstractions optimized for CUDA, ROCm, or vendor-specific toolchains, ensuring ultra-efficient execution, fault isolation, and reliability.
- Performance & Diagnostics: Develop profiling, observability, and diagnostics tooling for real-time insights into GPU utilization, memory bottlenecks, and latency deviations—continuously improving system SLOs.
- Next-Gen Enablement: Future-proof the stack to support evolving GPU architectures (e.g., H100, MI300), NVLink/Fabric topologies, and multi-accelerator systems (including FPGAs or custom silicon).
- Cross-Functional Collaboration: Work closely with teams across ML compilers, orchestration, cloud infrastructure, and hardware ops to ensure architectural alignment and unlock joint performance wins.
Ideal candidates have/are:
- Proven ability to ship high-performance, production-grade distributed systems and maintaining large scale GPU production deployments.
- Deep knowledge of GPU architecture (memory hierarchies, streams, kernels), OS internals, parallel algorithms, and HW/SW co-design principles.
- Proficient in systems languages such as C++ (CUDA), Python, or Rust—with fluency in writing hardware-aware code.
- Obsessed with performance profiling, GPU kernel tuning, memory coalescing, and resource-aware scheduling.
- Passionate about automation, testability, and continuous integration in large-scale systems.
- Comfortable navigating across stack layers—from GPU drivers and kernels to orchestration layers and inference serving.
- Strong communicator, pragmatic problem-solver, and builder of clean, sustainable code.
- Ownership-driven mindset—your code runs fast, scales gracefully, and meets real-world
Additionally Nice to Have:
- Experience operating large-scale GPU inference systems in production (e.g., Triton, TensorRT, or custom GPU services).
- Deploying and optimizing ML/HPC workloads on GPU clusters (Kubernetes, Slurm, Ray, etc.).
- Hands-on experience with multi-GPU training/inference frameworks (e.g., PyTorch DDP, DeepSpeed, or JAX).
- Familiarity with compiler tooling (e.g., TVM, MLIR, XLA) or deep learning graph optimization.
- Successful track record of delivering technically ambitious projects in fast-paced
Attributes of a Groqster:
- Humility - Egos are checked at the door
- Collaborative & Team Savvy - We make up the smartest person in the room, together
- Growth & Giver Mindset - Learn it all versus know it all, we share knowledge generously
- Curious & Innovative - Take a creative approach to projects, problems, and design
- Passion, Grit, & Boldness - no limit thinking, fueling informed risk taking
If this sounds like you, we’d love to hear from you!
Compensation: At Groq, a competitive base salary is part of our comprehensive compensation package, which includes equity and benefits. For this role, the base salary range is $248,710 to $292,600, determined by your skills, qualifications, experience and internal benchmarks.
#J-18808-Ljbffr