Mirage
Overview
Member of Technical Staff, GPU Optimization — Mirage is the leading AI short-form video company building full-stack foundation models and products for video creation, production and editing. We are a rapidly growing team in NYC seeking an expert in making AI models run fast at scale. Responsibilities
Optimize model training and inference pipelines for throughput, latency, and memory efficiency on NVIDIA GPUs, including data loading, preprocessing, checkpointing, and deployment Design, implement, and benchmark custom CUDA and Triton kernels for performance-critical operations Integrate low-level optimizations into PyTorch-based codebases (custom ops, low-precision formats, TorchInductor passes) Profile and debug the full stack from kernel launches to multi-GPU I/O paths using Nsight, nvprof, PyTorch Profiler, and custom tools Collaborate to co-design model architectures and data pipelines that are hardware-friendly and maintain state-of-the-art quality Stay current with GPU and compiler technologies (e.g., Hopper features, CUDA Graphs, Triton, FlashAttention) and assess their impact Work with infrastructure and backend teams to improve cluster orchestration, scaling strategies, and observability for large experiments Provide data-driven insights and trade-offs between performance, quality, and cost Support a culture of fast iteration, profiling, and performance-centered design Qualifications
Bachelor's degree in Computer Science, Electrical/Computer Engineering, or equivalent practical experience 3+ years of hands-on experience writing and optimizing CUDA kernels for production ML workloads Deep understanding of GPU architecture: memory hierarchies, warp scheduling, tensor cores, register pressure, and occupancy tuning Strong Python skills and familiarity with PyTorch internals, TorchScript, and distributed data-parallel training Proven track record profiling and accelerating large-scale training and inference (e.g., mixed precision, kernel fusion, custom collectives) Experience working in Linux environments with modern CI/CD, containers, and cluster managers such as Kubernetes Preferred Qualifications
Advanced degree (MS/PhD) in CS or EE, or related field Experience with multi-modal AI systems, video generation or computer vision models Familiarity with distributed training frameworks (DeepSpeed, FairScale, Megatron) and model parallelism Knowledge of compiler optimization techniques and MLIR, XLA or similar frameworks Experience with cloud infrastructure (AWS, GCP, Azure) and GPU cluster management Ability to translate research goals into performant code balancing numerical fidelity with hardware constraints Strong communication skills and experience mentoring junior engineers Benefits
Comprehensive medical, dental, and vision plans 401K with employer match Commuter Benefits Catered lunch multiple days per week Dinner stipend when working late Grubhub subscription Health & Wellness Perks (Talkspace, Kindbody, One Medical, HealthAdvocate, Teladoc) Team offsites and monthly team events Generous PTO policy Captions provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type. Benefits apply to full-time employees only. Compensation Range: $215K - $300K Location & Status
Role requires in-person work at our NYC HQ (Union Square). Employment type: Full-time. Seniority level: Mid-Senior level.
#J-18808-Ljbffr
Member of Technical Staff, GPU Optimization — Mirage is the leading AI short-form video company building full-stack foundation models and products for video creation, production and editing. We are a rapidly growing team in NYC seeking an expert in making AI models run fast at scale. Responsibilities
Optimize model training and inference pipelines for throughput, latency, and memory efficiency on NVIDIA GPUs, including data loading, preprocessing, checkpointing, and deployment Design, implement, and benchmark custom CUDA and Triton kernels for performance-critical operations Integrate low-level optimizations into PyTorch-based codebases (custom ops, low-precision formats, TorchInductor passes) Profile and debug the full stack from kernel launches to multi-GPU I/O paths using Nsight, nvprof, PyTorch Profiler, and custom tools Collaborate to co-design model architectures and data pipelines that are hardware-friendly and maintain state-of-the-art quality Stay current with GPU and compiler technologies (e.g., Hopper features, CUDA Graphs, Triton, FlashAttention) and assess their impact Work with infrastructure and backend teams to improve cluster orchestration, scaling strategies, and observability for large experiments Provide data-driven insights and trade-offs between performance, quality, and cost Support a culture of fast iteration, profiling, and performance-centered design Qualifications
Bachelor's degree in Computer Science, Electrical/Computer Engineering, or equivalent practical experience 3+ years of hands-on experience writing and optimizing CUDA kernels for production ML workloads Deep understanding of GPU architecture: memory hierarchies, warp scheduling, tensor cores, register pressure, and occupancy tuning Strong Python skills and familiarity with PyTorch internals, TorchScript, and distributed data-parallel training Proven track record profiling and accelerating large-scale training and inference (e.g., mixed precision, kernel fusion, custom collectives) Experience working in Linux environments with modern CI/CD, containers, and cluster managers such as Kubernetes Preferred Qualifications
Advanced degree (MS/PhD) in CS or EE, or related field Experience with multi-modal AI systems, video generation or computer vision models Familiarity with distributed training frameworks (DeepSpeed, FairScale, Megatron) and model parallelism Knowledge of compiler optimization techniques and MLIR, XLA or similar frameworks Experience with cloud infrastructure (AWS, GCP, Azure) and GPU cluster management Ability to translate research goals into performant code balancing numerical fidelity with hardware constraints Strong communication skills and experience mentoring junior engineers Benefits
Comprehensive medical, dental, and vision plans 401K with employer match Commuter Benefits Catered lunch multiple days per week Dinner stipend when working late Grubhub subscription Health & Wellness Perks (Talkspace, Kindbody, One Medical, HealthAdvocate, Teladoc) Team offsites and monthly team events Generous PTO policy Captions provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type. Benefits apply to full-time employees only. Compensation Range: $215K - $300K Location & Status
Role requires in-person work at our NYC HQ (Union Square). Employment type: Full-time. Seniority level: Mid-Senior level.
#J-18808-Ljbffr