Smallest
Senior Researcher - Text to Speech | San Francisco
Smallest, San Francisco, California, United States, 94199
What you’ll do
Lead research on
Text-to-Speech models
focused on naturalness, expressiveness, latency, and robustness
Design and train TTS systems for
real-world voices
across accents, languages, and speaking styles
Improve
streaming and low-latency speech synthesis
pipelines
Experiment with architectures, loss functions, and data strategies (multi-speaker training, style modeling, distillation, data augmentation)
Translate research ideas into
production-ready TTS systems
Collaborate closely with infra, product, and voice engineering teams
What we’re looking for
Strong background in
Text-to-Speech / speech generation research
Hands-on experience with deep learning frameworks ( PyTorch preferred )
Experience with
real-time or low-latency TTS systems
Familiarity with modern TTS architectures (Tacotron-style, FastSpeech, VITS, diffusion-based, neural vocoders)
Ability to think end-to-end:
data → model → inference → deployment
Prior work in
multilingual, expressive, or accented speech synthesis
is a strong plus
Great to have
Publications in top speech / ML conferences
Experience deploying
TTS models in real-time production
Exposure to
conversational AI or voice agents
Years of Experience
3–6 years
of specialized experience in speech through academia or industry
Education
Master’s or PhD in Speech, ML, or a related field
Note:
We often make exceptions and hire brilliant candidates regardless of years of experience or education.
Proof of work is paramount.
#J-18808-Ljbffr
Lead research on
Text-to-Speech models
focused on naturalness, expressiveness, latency, and robustness
Design and train TTS systems for
real-world voices
across accents, languages, and speaking styles
Improve
streaming and low-latency speech synthesis
pipelines
Experiment with architectures, loss functions, and data strategies (multi-speaker training, style modeling, distillation, data augmentation)
Translate research ideas into
production-ready TTS systems
Collaborate closely with infra, product, and voice engineering teams
What we’re looking for
Strong background in
Text-to-Speech / speech generation research
Hands-on experience with deep learning frameworks ( PyTorch preferred )
Experience with
real-time or low-latency TTS systems
Familiarity with modern TTS architectures (Tacotron-style, FastSpeech, VITS, diffusion-based, neural vocoders)
Ability to think end-to-end:
data → model → inference → deployment
Prior work in
multilingual, expressive, or accented speech synthesis
is a strong plus
Great to have
Publications in top speech / ML conferences
Experience deploying
TTS models in real-time production
Exposure to
conversational AI or voice agents
Years of Experience
3–6 years
of specialized experience in speech through academia or industry
Education
Master’s or PhD in Speech, ML, or a related field
Note:
We often make exceptions and hire brilliant candidates regardless of years of experience or education.
Proof of work is paramount.
#J-18808-Ljbffr