Cartesia, Inc.
Synthetic Data Specialist
The future of AI training will be built on a foundation of high-quality synthetic data. We are looking for a creative and resourceful Synthetic Data Specialist to design and build the systems that generate training data at an unprecedented scale. This is a unique, high-impact role, where you will solve critical data bottlenecks and directly accelerate our research progress. What you'll do: Evaluate fidelity, diversity, and usefulness of synthetic data across LLMs, audio generation, and audio understanding. Implement techniques for steering data generation to improve model intelligence through data and mitigate bias. Build automated quality control systems to validate and filter generated data. Design synthetic datasets at large scale to develop model capabilities. Stay on the cutting edge of research in synthetic data generation, data augmentation, and generative models. What we're looking for: Experience with generative models (speech, text, or multimodal). Strong applied ML background with a focus on data-centric approaches. Understanding of evaluation methods for synthetic data quality. Excitement for building scalable systems that bridge research and production. Familiarity with building large-scale distributed systems for synthetic data generation Our culture: We're an in-person team based out of San Francisco. We love being in the office, hanging out together and learning from each other everyday. We ship fast. All of our work is novel and cutting edge, and execution speed is paramount. We have a high bar, and we don't sacrifice quality and design along the way. We support each other. We have an open and inclusive culture that's focused on giving everyone the resources they need to succeed. Our perks: Lunch, dinner and snacks at the office. Fully covered medical, dental, and vision insurance for employees. 401(k). Relocation and immigration support. Your own personal Yoshi.
The future of AI training will be built on a foundation of high-quality synthetic data. We are looking for a creative and resourceful Synthetic Data Specialist to design and build the systems that generate training data at an unprecedented scale. This is a unique, high-impact role, where you will solve critical data bottlenecks and directly accelerate our research progress. What you'll do: Evaluate fidelity, diversity, and usefulness of synthetic data across LLMs, audio generation, and audio understanding. Implement techniques for steering data generation to improve model intelligence through data and mitigate bias. Build automated quality control systems to validate and filter generated data. Design synthetic datasets at large scale to develop model capabilities. Stay on the cutting edge of research in synthetic data generation, data augmentation, and generative models. What we're looking for: Experience with generative models (speech, text, or multimodal). Strong applied ML background with a focus on data-centric approaches. Understanding of evaluation methods for synthetic data quality. Excitement for building scalable systems that bridge research and production. Familiarity with building large-scale distributed systems for synthetic data generation Our culture: We're an in-person team based out of San Francisco. We love being in the office, hanging out together and learning from each other everyday. We ship fast. All of our work is novel and cutting edge, and execution speed is paramount. We have a high bar, and we don't sacrifice quality and design along the way. We support each other. We have an open and inclusive culture that's focused on giving everyone the resources they need to succeed. Our perks: Lunch, dinner and snacks at the office. Fully covered medical, dental, and vision insurance for employees. 401(k). Relocation and immigration support. Your own personal Yoshi.