Smallest Inc.

Data Scientist (Speech) | SF

Smallest Inc., San Francisco, California, United States, 94199

What You’ll Do

Dive into model architectures (ASR / TTS / SLMs) and optimize them for specific GPUs and hardware profiles

Build, debug, and tune kernels using CUDA / Tinygrad / AMD toolchains

Convert, optimize, and benchmark models using TensorRT, ONNX, and other inference engines

Work hands-on with PyTorch to train, fine-tune, and evaluate real-time speech models

Run large-scale experiments, manage datasets, and analyze model performance at scale

Productionize models for ultra-low latency speech workloads

Collaborate with research, infra, and product teams to push models into production

Requirements

Strong experience with CUDA, Tinygrad, AMD GPU toolkit, or similar low-level GPU programming stacks

Hands‑on proficiency with PyTorch and Python

Deep understanding of neural networks, training dynamics, and optimization

Experience handling and processing large datasets

Familiarity with production inference pipelines

Strong problem‑solving skills with ability to go deep into performance bottlenecks

Great to Have

Experience training speech models (ASR, TTS, SSL, etc.)

Familiarity with audio encoders, decoders, waveform models

Experience with MLOps, experiment tracking, deployment pipelines

Training or fine‑tuning models for production / published papers

Experience with TensorRT and ONNX Runtime

#J-18808-Ljbffr