Zoom
Join to apply for the
Audio AI Engineer
role at
Zoom .
As an Audio AI Engineer, you will research and develop algorithms for accent conversion, voice conversion, speech synthesis, and speech recognition on low‑latency streaming architectures. You’ll prototype and refine end‑to‑end audio models that enhance intelligibility and naturalness while maintaining speaker identity. Working closely with product and platform teams, you’ll help bring these models into real‑time communication systems and evaluate and optimize model performance across quality, latency, and scalability dimensions. Staying current with advances in speech processing, you’ll contribute to innovation through patents and internal knowledge sharing.
Responsibilities
Research, design, and develop algorithms for accent conversion, voice conversion, speech synthesis, and automatic speech recognition with a focus on low‑latency streaming architectures.
Prototype and fine‑tune end‑to‑end audio models that improve intelligibility and naturalness while preserving speaker identity.
Collaborate with product and platform teams to integrate models into real‑time video and audio communication systems.
Analyze and optimize model performance across speech quality, latency, robustness, and scalability.
Stay current with latest speech‑processing research and contribute to the community through patents and internal knowledge sharing.
Qualifications
PhD or equivalent experience in a relevant field such as streaming, voice conversion, TTS, or ASR.
Proficiency in deep‑learning frameworks like PyTorch or TensorFlow.
Strong programming skills in Python and/or C/C++.
Knowledge of sequence‑modeling architectures (Transformers, RNNs, diffusion models, conformers).
Experience developing and deploying low‑latency, real‑time speech or audio models with streaming architectures.
Familiarity with model compression and acceleration techniques (quantization, pruning, distillation).
Experience with real‑time audio systems in networked communication environments.
Publications in top‑tier conferences such as ICASSP, INTERSPEECH, NeurIPS, ICLR.
Fluency in Mandarin (required).
Salary Range : $127,700–$255,400. Pay is based on qualifications and experience.
Benefits : Zoom offers a comprehensive benefits program that supports physical, mental, emotional, and financial health. Options include health, dental, vision coverage, paid time off, parental leave, and equity compensation. Learn more about our benefits program.
Anticipated Position Close Date : 11/06/25
About Zoom : Zoom helps people stay connected to collaborate and communicate better with products like Zoom Contact Center, Zoom Phone, and Zoom Apps. We are committed to fair hiring practices and support accommodations throughout the hiring process.
We do not add or alter any information from the original posting. Please read all details carefully before applying.
#J-18808-Ljbffr
Audio AI Engineer
role at
Zoom .
As an Audio AI Engineer, you will research and develop algorithms for accent conversion, voice conversion, speech synthesis, and speech recognition on low‑latency streaming architectures. You’ll prototype and refine end‑to‑end audio models that enhance intelligibility and naturalness while maintaining speaker identity. Working closely with product and platform teams, you’ll help bring these models into real‑time communication systems and evaluate and optimize model performance across quality, latency, and scalability dimensions. Staying current with advances in speech processing, you’ll contribute to innovation through patents and internal knowledge sharing.
Responsibilities
Research, design, and develop algorithms for accent conversion, voice conversion, speech synthesis, and automatic speech recognition with a focus on low‑latency streaming architectures.
Prototype and fine‑tune end‑to‑end audio models that improve intelligibility and naturalness while preserving speaker identity.
Collaborate with product and platform teams to integrate models into real‑time video and audio communication systems.
Analyze and optimize model performance across speech quality, latency, robustness, and scalability.
Stay current with latest speech‑processing research and contribute to the community through patents and internal knowledge sharing.
Qualifications
PhD or equivalent experience in a relevant field such as streaming, voice conversion, TTS, or ASR.
Proficiency in deep‑learning frameworks like PyTorch or TensorFlow.
Strong programming skills in Python and/or C/C++.
Knowledge of sequence‑modeling architectures (Transformers, RNNs, diffusion models, conformers).
Experience developing and deploying low‑latency, real‑time speech or audio models with streaming architectures.
Familiarity with model compression and acceleration techniques (quantization, pruning, distillation).
Experience with real‑time audio systems in networked communication environments.
Publications in top‑tier conferences such as ICASSP, INTERSPEECH, NeurIPS, ICLR.
Fluency in Mandarin (required).
Salary Range : $127,700–$255,400. Pay is based on qualifications and experience.
Benefits : Zoom offers a comprehensive benefits program that supports physical, mental, emotional, and financial health. Options include health, dental, vision coverage, paid time off, parental leave, and equity compensation. Learn more about our benefits program.
Anticipated Position Close Date : 11/06/25
About Zoom : Zoom helps people stay connected to collaborate and communicate better with products like Zoom Contact Center, Zoom Phone, and Zoom Apps. We are committed to fair hiring practices and support accommodations throughout the hiring process.
We do not add or alter any information from the original posting. Please read all details carefully before applying.
#J-18808-Ljbffr