Smallest

Senior Researcher - Text to Speech | San Francisco

Smallest, San Francisco, California, United States, 94199

What you’ll do

Lead research on

Text-to-Speech models

focused on naturalness, expressiveness, latency, and robustness

Design and train TTS systems for

real-world voices

across accents, languages, and speaking styles

Improve

streaming and low-latency speech synthesis

pipelines

Experiment with architectures, loss functions, and data strategies (multi-speaker training, style modeling, distillation, data augmentation)

Translate research ideas into

production-ready TTS systems

Collaborate closely with infra, product, and voice engineering teams

What we’re looking for

Strong background in

Text-to-Speech / speech generation research

Hands-on experience with deep learning frameworks ( PyTorch preferred )

Experience with

real-time or low-latency TTS systems

Familiarity with modern TTS architectures (Tacotron-style, FastSpeech, VITS, diffusion-based, neural vocoders)

Ability to think end-to-end:

data → model → inference → deployment

Prior work in

multilingual, expressive, or accented speech synthesis

is a strong plus

Great to have

Publications in top speech / ML conferences

Experience deploying

TTS models in real-time production

Exposure to

conversational AI or voice agents

Years of Experience

3–6 years

of specialized experience in speech through academia or industry

Education

Master’s or PhD in Speech, ML, or a related field

Note:

We often make exceptions and hire brilliant candidates regardless of years of experience or education.

Proof of work is paramount.

#J-18808-Ljbffr