United Language Group
Join the
Audio AI Engineer
team at
United Language Group , working to make communication accessible to everyone through real‑time, multilingual interpretation and AI‑powered tools. Propio is a leader in real‑time interpretation and multilingual language services. We connect people with the information they need across language, culture, and modality, building AI‑powered tools to enhance interpreter workflows, automate multilingual insights, and scale communication quality across industries. Key Responsibilities
Design and optimize end‑to‑end Speech‑to‑Speech pipelines that integrate ASR, translation, and TTS with minimal latency Build bidirectional interpretation systems that handle turn‑taking, speaker identification, and context preservation across language boundaries Collaborate with the Audio/Speech Engineer to optimize latency, quality, and robustness of speech components in the full pipeline Work with the Staff ML Engineer to design efficient inference architectures and deployment strategies for real‑time streaming systems Develop streaming ASR and TTS systems capable of handling continuous, overlapping speech in interpretation scenarios Benchmark and optimize latency across all pipeline stages (speech capture, recognition, translation, synthesis) Integrate speaker diarization, acoustic environment adaptation, and speech enhancement into interpretation workflows Partner with linguists and product teams to validate interpretation quality and gather domain‑specific feedback Requirements
Bachelor's or Master’s Degree in Electrical Engineering, Computer Science, or related field 3+ years of experience in speech processing, audio engineering, or conversational AI systems Deep expertise in ASR, TTS, and streaming audio architectures Proficiency in Python, ML frameworks, and experience with real‑time signal processing Experience building low‑latency production systems and optimizing for inference performance Strong understanding of interpretation workflows, multilingual challenges, and speech quality metrics Preferred Qualifications
Experience building speech‑to‑text pipelines or hybrid ASR + LLM systems Familiarity with real‑time audio processing or latency‑sensitive applications Seniority Level
Mid‑Senior level Employment Type
Full‑time Job Function
Engineering and Information Technology Kansas City, MO
#J-18808-Ljbffr
Audio AI Engineer
team at
United Language Group , working to make communication accessible to everyone through real‑time, multilingual interpretation and AI‑powered tools. Propio is a leader in real‑time interpretation and multilingual language services. We connect people with the information they need across language, culture, and modality, building AI‑powered tools to enhance interpreter workflows, automate multilingual insights, and scale communication quality across industries. Key Responsibilities
Design and optimize end‑to‑end Speech‑to‑Speech pipelines that integrate ASR, translation, and TTS with minimal latency Build bidirectional interpretation systems that handle turn‑taking, speaker identification, and context preservation across language boundaries Collaborate with the Audio/Speech Engineer to optimize latency, quality, and robustness of speech components in the full pipeline Work with the Staff ML Engineer to design efficient inference architectures and deployment strategies for real‑time streaming systems Develop streaming ASR and TTS systems capable of handling continuous, overlapping speech in interpretation scenarios Benchmark and optimize latency across all pipeline stages (speech capture, recognition, translation, synthesis) Integrate speaker diarization, acoustic environment adaptation, and speech enhancement into interpretation workflows Partner with linguists and product teams to validate interpretation quality and gather domain‑specific feedback Requirements
Bachelor's or Master’s Degree in Electrical Engineering, Computer Science, or related field 3+ years of experience in speech processing, audio engineering, or conversational AI systems Deep expertise in ASR, TTS, and streaming audio architectures Proficiency in Python, ML frameworks, and experience with real‑time signal processing Experience building low‑latency production systems and optimizing for inference performance Strong understanding of interpretation workflows, multilingual challenges, and speech quality metrics Preferred Qualifications
Experience building speech‑to‑text pipelines or hybrid ASR + LLM systems Familiarity with real‑time audio processing or latency‑sensitive applications Seniority Level
Mid‑Senior level Employment Type
Full‑time Job Function
Engineering and Information Technology Kansas City, MO
#J-18808-Ljbffr