David AI
Software Engineer, Machine Learning Infrastructure
David AI, San Francisco, California, United States, 94199
About David AI
David AI is the first audio data research company. We bring an R&D approach to data-developing datasets with the same rigor AI labs bring to models. Our mission is to bring AI into the real world, and we believe audio is the gateway. Speech is versatile, accessible, and human-it fits naturally into everyday life. As audio AI advances and new use cases emerge, high-quality training data is the bottleneck. This is where David AI comes in.
David AI was founded in 2024 by a team of former Scale AI engineers and operators. In less than a year, we've brought on most FAANG companies and AI labs as customers. We recently raised a $25M Series A from Jack Altman (Alt Capital), Amplify Partners, First Round Capital, and other Tier 1 investors.
Our team is sharp, humble, ambitious, and tight-knit. We're looking for the best research, engineering, product, and operations minds to join us on our mission to push the frontier of audio AI.
About our Engineering team
At David AI, our engineers build the pipelines, platforms, and models that transform raw audio into high-signal data for leading AI labs and enterprises. We're a tight-knit team of product engineers, infrastructure specialists, and machine learning experts focused on building the world's first audio data research company.
We move fast, own our work end-to-end, and ship to production daily. Our team designs real-time pipelines handling terabytes of speech data and deploys cutting-edge generative audio models.
About this role
As a
Software Engineer, Machine Learning Infrastructure
at David AI, you will build and scale the core infrastructure that powers our cutting-edge audio ML products. You'll be leading the development of the systems that enable our researchers and engineers to train, deploy, and evaluate machine learning models efficiently.
In this role, you will Design and maintain data pipelines
for processing massive audio datasets, ensuring terabytes of data are managed, versioned, and fed into model training efficiently. Develop frameworks for training audio models
on compute clusters, managing cloud resources, optimizing GPU utilization, and improving experiment reproducibility. Create robust infrastructure for deploying ML models to production , including APIs, microservices, model serving frameworks, and real-time performance monitoring. Apply software engineering best practices
with monitoring, logging, and alerting to guarantee high availability and fault-tolerant production workloads. Translate research prototypes into production pipelines , working with ML engineers and data teams to support efficient data labeling and preparation. Evaluate and integrate new MLOps technologies
and optimization techniques to enhance infrastructure velocity and reliability. Your background looks like 5+ years of backend engineering with 2+ years ML infrastructure experience. Hands-on experience scaling cloud infrastructure and large-scale data processing pipelines for ML model training and evaluation. Proficient with Docker, Kubernetes, and CI/CD pipelines. Proven ML model deployment and lifecycle management in production. Strong system design skills optimizing for scale and performance. Proficient in Python with deep Kubernetes experience. Bonus points if you have Experience with feature stores, experiment tracking (MLflow, Weights and Biases), or custom CI/CD pipelines. Familiarity with large-scale data ingestion and streaming systems (Spark, Kafka, Airflow). Proven ability to thrive in fast-moving startup environments. Some technologies we work with Next.js, TypeScript, TailwindCSS, Node.js, tRPC, PostgreSQL, AWS, Trigger.dev, WebRTC, FFmpeg. Compensation and benefits
Rapid career growth at one of the fastest growing Series A companies, within a new and booming industry. Competitive salary and equity package. Flexible PTO policy. Top-notch health, dental, and vision coverage with 100% company reimbursement for most plans. Paid lunch and dinner in the office, every day through DoorDash. 401k access.
David AI is the first audio data research company. We bring an R&D approach to data-developing datasets with the same rigor AI labs bring to models. Our mission is to bring AI into the real world, and we believe audio is the gateway. Speech is versatile, accessible, and human-it fits naturally into everyday life. As audio AI advances and new use cases emerge, high-quality training data is the bottleneck. This is where David AI comes in.
David AI was founded in 2024 by a team of former Scale AI engineers and operators. In less than a year, we've brought on most FAANG companies and AI labs as customers. We recently raised a $25M Series A from Jack Altman (Alt Capital), Amplify Partners, First Round Capital, and other Tier 1 investors.
Our team is sharp, humble, ambitious, and tight-knit. We're looking for the best research, engineering, product, and operations minds to join us on our mission to push the frontier of audio AI.
About our Engineering team
At David AI, our engineers build the pipelines, platforms, and models that transform raw audio into high-signal data for leading AI labs and enterprises. We're a tight-knit team of product engineers, infrastructure specialists, and machine learning experts focused on building the world's first audio data research company.
We move fast, own our work end-to-end, and ship to production daily. Our team designs real-time pipelines handling terabytes of speech data and deploys cutting-edge generative audio models.
About this role
As a
Software Engineer, Machine Learning Infrastructure
at David AI, you will build and scale the core infrastructure that powers our cutting-edge audio ML products. You'll be leading the development of the systems that enable our researchers and engineers to train, deploy, and evaluate machine learning models efficiently.
In this role, you will Design and maintain data pipelines
for processing massive audio datasets, ensuring terabytes of data are managed, versioned, and fed into model training efficiently. Develop frameworks for training audio models
on compute clusters, managing cloud resources, optimizing GPU utilization, and improving experiment reproducibility. Create robust infrastructure for deploying ML models to production , including APIs, microservices, model serving frameworks, and real-time performance monitoring. Apply software engineering best practices
with monitoring, logging, and alerting to guarantee high availability and fault-tolerant production workloads. Translate research prototypes into production pipelines , working with ML engineers and data teams to support efficient data labeling and preparation. Evaluate and integrate new MLOps technologies
and optimization techniques to enhance infrastructure velocity and reliability. Your background looks like 5+ years of backend engineering with 2+ years ML infrastructure experience. Hands-on experience scaling cloud infrastructure and large-scale data processing pipelines for ML model training and evaluation. Proficient with Docker, Kubernetes, and CI/CD pipelines. Proven ML model deployment and lifecycle management in production. Strong system design skills optimizing for scale and performance. Proficient in Python with deep Kubernetes experience. Bonus points if you have Experience with feature stores, experiment tracking (MLflow, Weights and Biases), or custom CI/CD pipelines. Familiarity with large-scale data ingestion and streaming systems (Spark, Kafka, Airflow). Proven ability to thrive in fast-moving startup environments. Some technologies we work with Next.js, TypeScript, TailwindCSS, Node.js, tRPC, PostgreSQL, AWS, Trigger.dev, WebRTC, FFmpeg. Compensation and benefits
Rapid career growth at one of the fastest growing Series A companies, within a new and booming industry. Competitive salary and equity package. Flexible PTO policy. Top-notch health, dental, and vision coverage with 100% company reimbursement for most plans. Paid lunch and dinner in the office, every day through DoorDash. 401k access.