Smallest
Senior Researcher - Speech to Text | San Francisco
Smallest, San Francisco, California, United States, 94199
What you’ll do
Lead research on ASR models focused on accuracy, latency, and robustness
Design and train speech-to-text models for noisy, accented, and low-resource settings
Improve streaming and real-time decoding pipelines
Experiment with architectures, loss functions, and data strategies (augmentation, semi-supervised learning, distillation)
Translate research ideas into production-ready systems
Collaborate closely with infra, product, and voice engineering teams
What we’re looking for
Strong background in ASR / speech research
Hands-on experience with deep learning frameworks (PyTorch preferred)
Experience with streaming or low-latency ASR systems
Familiarity with modern ASR architectures (CTC, Transducers, attention-based, hybrid)
Ability to think end-to-end: data → model → deployment
Prior work in multilingual or accented speech is a strong plus
Great to have
Publications in top speech / ML conferences
Experience deploying models in real-time production systems
Exposure to conversational AI
Years of Experience
3-6 years of specialized experience in speech through academia or industry
Education
Masters or PhD in Speech
Note - we often make exceptions and hire brilliant candidates regardless of years of experience or education - proof of work is paramount
#J-18808-Ljbffr
Lead research on ASR models focused on accuracy, latency, and robustness
Design and train speech-to-text models for noisy, accented, and low-resource settings
Improve streaming and real-time decoding pipelines
Experiment with architectures, loss functions, and data strategies (augmentation, semi-supervised learning, distillation)
Translate research ideas into production-ready systems
Collaborate closely with infra, product, and voice engineering teams
What we’re looking for
Strong background in ASR / speech research
Hands-on experience with deep learning frameworks (PyTorch preferred)
Experience with streaming or low-latency ASR systems
Familiarity with modern ASR architectures (CTC, Transducers, attention-based, hybrid)
Ability to think end-to-end: data → model → deployment
Prior work in multilingual or accented speech is a strong plus
Great to have
Publications in top speech / ML conferences
Experience deploying models in real-time production systems
Exposure to conversational AI
Years of Experience
3-6 years of specialized experience in speech through academia or industry
Education
Masters or PhD in Speech
Note - we often make exceptions and hire brilliant candidates regardless of years of experience or education - proof of work is paramount
#J-18808-Ljbffr