Smallest Inc.
What You’ll Do
Dive into model architectures (ASR / TTS / SLMs) and optimize them for specific GPUs and hardware profiles
Build, debug, and tune kernels using CUDA / Tinygrad / AMD toolchains
Convert, optimize, and benchmark models using TensorRT, ONNX, and other inference engines
Work hands-on with PyTorch to train, fine-tune, and evaluate real-time speech models
Run large-scale experiments, manage datasets, and analyze model performance at scale
Productionize models for ultra-low latency speech workloads
Collaborate with research, infra, and product teams to push models into production
Requirements
Strong experience with CUDA, Tinygrad, AMD GPU toolkit, or similar low-level GPU programming stacks
Hands‑on proficiency with PyTorch and Python
Deep understanding of neural networks, training dynamics, and optimization
Experience handling and processing large datasets
Familiarity with production inference pipelines
Strong problem‑solving skills with ability to go deep into performance bottlenecks
Great to Have
Experience training speech models (ASR, TTS, SSL, etc.)
Familiarity with audio encoders, decoders, waveform models
Experience with MLOps, experiment tracking, deployment pipelines
Training or fine‑tuning models for production / published papers
Experience with TensorRT and ONNX Runtime
#J-18808-Ljbffr
Dive into model architectures (ASR / TTS / SLMs) and optimize them for specific GPUs and hardware profiles
Build, debug, and tune kernels using CUDA / Tinygrad / AMD toolchains
Convert, optimize, and benchmark models using TensorRT, ONNX, and other inference engines
Work hands-on with PyTorch to train, fine-tune, and evaluate real-time speech models
Run large-scale experiments, manage datasets, and analyze model performance at scale
Productionize models for ultra-low latency speech workloads
Collaborate with research, infra, and product teams to push models into production
Requirements
Strong experience with CUDA, Tinygrad, AMD GPU toolkit, or similar low-level GPU programming stacks
Hands‑on proficiency with PyTorch and Python
Deep understanding of neural networks, training dynamics, and optimization
Experience handling and processing large datasets
Familiarity with production inference pipelines
Strong problem‑solving skills with ability to go deep into performance bottlenecks
Great to Have
Experience training speech models (ASR, TTS, SSL, etc.)
Familiarity with audio encoders, decoders, waveform models
Experience with MLOps, experiment tracking, deployment pipelines
Training or fine‑tuning models for production / published papers
Experience with TensorRT and ONNX Runtime
#J-18808-Ljbffr