Zendesk

Senior AI Agent Engineer - Voice AI

Zendesk, Adah, Pennsylvania, United States

Overview

The Agentic Tribe is revolutionizing the chatbot and voice assistance landscape with Gen3, a cutting-edge AI Agent system that is goal-oriented, dynamic, and truly conversational. Gen3 leverages a multi-agent architecture and advanced language models to deliver personalized experiences, handle complex tasks, and respond to off-script inquiries in real time. We are seeking a passionate and experienced Senior Voice AI Agent Engineer with a strong focus on Voice AI to join our team. You will innovate at the forefront of conversational AI, engineering autonomous agents that can listen, understand, and speak with human-like fluidity. You will build the cognitive architecture for voice applications, creating systems that can reason, plan, and execute complex tasks through seamless, low-latency spoken dialogue. A key part of your role is to communicate complex technical concepts to both technical and non-technical stakeholders. What you will do

Design and develop robust, stateful, and scalable voice-first AI agents using Python, optimized for real-time voice interactions, managing turn-taking, interruptions, and low-latency responses. Integrate real-time Speech-to-Text (STT), Text-to-Speech (TTS), and Voice Activity Detection (VAD) services to create a seamless conversational flow. Connect voice agents with enterprise systems, databases, and third-party APIs to enable end-to-end automated workflows initiated and managed through voice. Establish and own the evaluations for voice agent performance and behavior, iterating to improve performance, reliability, and the overall user experience. Build end-to-end conversational flows with reasoning, planning, and dynamic tool use beyond pre-scripted experiences. Collaborate with product managers, ML scientists, and engineers to deeply understand user needs and voice interaction goals. Implement fallback, recovery, and error-handling strategies to address noisy audio input or speech recognition inaccuracies. Define and track voice-specific evaluation metrics (e.g., word error rate, latency, conversational naturalness). Develop observability tools and guardrails to monitor performance, ensure safety, and handle edge cases in spoken interactions. Document development, architecture decisions, and research findings to share knowledge across the team. Requirements

LLM-Oriented System Design: Strong experience building multi-step, tool-using agents (LangChain, Autogen). Familiar with prompt engineering, context management, and reasoning strategies like Chain-of-Thought and ReAct. Voice AI Expertise:

Experience building low-latency, streaming voice applications. Expertise in integrating and managing real-time STT/TTS models and APIs. Proficient with Voice Activity Detection (VAD), noise suppression, and robust barge-in/interruption logic. Experience with third-party voice AI APIs, including STT and TTS services from providers like OpenAI, Deepgram, ElevenLabs, etc. Understanding of latency, timing, and streaming audio constraints.

Tool Integration & APIs: Comfortable connecting agents to external APIs, tools, and databases in secure environments. RAG (Retrieval-Augmented Generation): Building pipelines with vector stores, chunking strategies, and hybrid retrieval. Evaluation & Observability: Implementing and using monitoring tools and evaluation frameworks (Braintrust) to score AI Agents. Safety & Reliability: Familiarity with prompt injection defense, guardrails (Rebuff, Guardrails AI), and failover logic. Performance Optimization: Token budget and latency management using caching and model routing. Programming & Deployment: Expert in Python, FastAPI, and LLM SDKs. Experience deploying AI apps to cloud platforms (AWS, GCP, Azure) using CI/CD. Nice-to-have

M.S. / Ph.D. in Computer Science, NLP, Machine Learning, or related field Background in spoken dialogue systems or conversational UX design Familiarity with real-time streaming architecture (e.g., WebRTC, gRPC, socket.io) Multilingual ASR/TTS pipeline experience About Zendesk

Zendesk builds software for better customer relationships. It empowers organizations to improve customer engagement and understand their customers, with products designed to be easy to use and implement. Zendesk serves more than 100,000 paid customer accounts in over 150 countries. It is based in San Francisco with operations globally and offers a hybrid working model. Zendesk is an equal opportunity employer and fosters diversity, equity, and inclusion in the workplace. The company takes steps to accommodate applicants with disabilities and provides a Candidate Privacy Notice describing data processing related to recruiting.

#J-18808-Ljbffr