Logo
The Rundown AI, Inc.

Sr. Staff Software Engineer – High Performance GPU Inference Systems

The Rundown AI, Inc., Mission

Save Job

Sr. Staff Software Engineer - High Performance GPU Inference Systems

Mission:

Push the limits of heterogeneous GPU environments, dynamic global scheduling, and end-to-end system performance—all while running code as close to the metal as possible.

Responsibilities & opportunities in this role:

  • Distributed Systems Engineering: Design and implement scalable, low-latency runtime systems that coordinate thousands of GPUs across tightly integrated, software-defined infrastructure.
  • Low-Level GPU Optimization: Build deterministic, hardware-aware abstractions optimized for CUDA, ROCm, or vendor-specific toolchains, ensuring ultra-efficient execution, fault isolation, and reliability.
  • Performance & Diagnostics: Develop profiling, observability, and diagnostics tooling for real-time insights into GPU utilization, memory bottlenecks, and latency deviations—continuously improving system SLOs.
  • Next-Gen Enablement: Future-proof the stack to support evolving GPU architectures (e.g., H100, MI300), NVLink/Fabric topologies, and multi-accelerator systems (including FPGAs or custom silicon).
  • Cross-Functional Collaboration: Work closely with teams across ML compilers, orchestration, cloud infrastructure, and hardware ops to ensure architectural alignment and unlock joint performance wins.

Ideal candidates have/are:

  • Proven ability to ship high-performance, production-grade distributed systems and maintaining large scale GPU production deployments.
  • Deep knowledge of GPU architecture (memory hierarchies, streams, kernels), OS internals, parallel algorithms, and HW/SW co-design principles.
  • Proficient in systems languages such as C++ (CUDA), Python, or Rust—with fluency in writing hardware-aware code.
  • Obsessed with performance profiling, GPU kernel tuning, memory coalescing, and resource-aware scheduling.
  • Passionate about automation, testability, and continuous integration in large-scale systems.
  • Comfortable navigating across stack layers—from GPU drivers and kernels to orchestration layers and inference serving.
  • Strong communicator, pragmatic problem-solver, and builder of clean, sustainable code.
  • Ownership-driven mindset—your code runs fast, scales gracefully, and meets real-world

Additionally Nice to Have:

  • Experience operating large-scale GPU inference systems in production (e.g., Triton, TensorRT, or custom GPU services).
  • Deploying and optimizing ML/HPC workloads on GPU clusters (Kubernetes, Slurm, Ray, etc.).
  • Hands-on experience with multi-GPU training/inference frameworks (e.g., PyTorch DDP, DeepSpeed, or JAX).
  • Familiarity with compiler tooling (e.g., TVM, MLIR, XLA) or deep learning graph optimization.
  • Successful track record of delivering technically ambitious projects in fast-paced

Attributes of a Groqster:

  • Humility - Egos are checked at the door
  • Collaborative & Team Savvy - We make up the smartest person in the room, together
  • Growth & Giver Mindset - Learn it all versus know it all, we share knowledge generously
  • Curious & Innovative - Take a creative approach to projects, problems, and design
  • Passion, Grit, & Boldness - no limit thinking, fueling informed risk taking

If this sounds like you, we’d love to hear from you!

Compensation: At Groq, a competitive base salary is part of our comprehensive compensation package, which includes equity and benefits. For this role, the base salary range is $248,710 to $292,600, determined by your skills, qualifications, experience and internal benchmarks.

#J-18808-Ljbffr