Sanas
Overview
Sanas is revolutionizing the way we communicate with the world’s first real-time algorithm, designed to modulate accents, eliminate background noises, and magnify speech clarity. Our GDP-shifting technology sets a gold standard and is backed by seasoned founders with a track record of guiding unicorns. Sanas is a 200-strong team, established in 2020. We have secured over $100 million in funding and partner with leading investors, including Insight Partners, Google Ventures, Quadrille Capital, General Catalyst, and Quiet Capital. We collaborate with Fortune 100 companies and are shaping the future of communication. We’re looking for an experienced and forward-thinking Principal Data Engineer to lead the design and implementation of our end-to-end data infrastructure for industry-leading Voice AI products. This high-impact role shapes the technical vision, owns strategic architecture decisions, and mentors a growing team of Data engineers focused on delivering reliable and scalable data systems for Machine Learning at scale. You’ll work cross-functionally with AI research scientists, infrastructure, and product teams to ensure that data—from raw audio to training-ready features—is consistently accessible, compliant, and optimized for speed and scale. You’ll help push the boundaries of real-time Voice AI! Key Responsibilities
Architect and lead the development of large-scale data pipelines and data lakes to ingest, transform and serve high-quality data for AI model training, product telemetry, and analytics. Drive long-term data infrastructure strategy across streaming and batch, feature store extensions, Iceberg/Delta lake choices, metadata management, and lakehouse evolution. Drive platform and infrastructure decisions, optimizing compute fleets (e.g., Ray, Spark clusters), orchestration tooling (Airflow, Dagster), and streaming stacks (Kafka, Flink). Collaborate with AI research scientists, engineering leads, product, finance, marketing, and legal to align data architecture with business and regulatory requirements. Advocate best practices in data governance, lineage, observability, testing, tooling, and disaster recovery across pipelines and data stores. Act as a mentor and technical leader—review design and code, share patterns, elevate team capability, and support recruitment and hiring. Drive build vs buy decisions for tools to implement data quality and observability solutions to achieve high data quality. Qualifications
10+ years of experience in Data Engineering, Infrastructure, or ML Systems, with at least 2+ years in a technical leadership capacity. Expertise in building distributed batch and real-time data systems. Expertise in databases (like PostgreSQL) and data lakes (like Snowflake, Databricks, and ClickHouse). Experience using data processing frameworks like Spark, Flink, and Ray. Deep experience with cloud platforms (AWS/GCP), object storage (e.g., S3), and orchestrators like Airflow and Dagster. Strong knowledge of data lifecycle management, including privacy, security, compliance, and reproducibility. Comfortable working in a fast-paced startup environment. Strategic mindset and proven ability to collaborate across engineering, ML, and product teams to deliver scalable infrastructure. Nice to Have
Familiarity with audio data and its challenges, such as large file sizes and time-series features, metadata handling. Experience with Voice AI models like ASR, TTS, and speaker verification. Familiarity with real-time data processing frameworks like Kafka, Flink, Druid, and Pinot. Familiarity with ML workflows including MLOps, feature engineering, model training, and inference. Experience with labeling tools, audio annotation platforms, or human-in-the-loop annotation pipelines. Joining us means contributing to the world’s first real-time speech understanding platform revolutionizing Contact Centers and Enterprises alike. Our technology empowers agents, transforms customer experiences, and drives measurable growth. You’ll be part of a team exploring the vast potential of an increasingly sonic future.
#J-18808-Ljbffr
Sanas is revolutionizing the way we communicate with the world’s first real-time algorithm, designed to modulate accents, eliminate background noises, and magnify speech clarity. Our GDP-shifting technology sets a gold standard and is backed by seasoned founders with a track record of guiding unicorns. Sanas is a 200-strong team, established in 2020. We have secured over $100 million in funding and partner with leading investors, including Insight Partners, Google Ventures, Quadrille Capital, General Catalyst, and Quiet Capital. We collaborate with Fortune 100 companies and are shaping the future of communication. We’re looking for an experienced and forward-thinking Principal Data Engineer to lead the design and implementation of our end-to-end data infrastructure for industry-leading Voice AI products. This high-impact role shapes the technical vision, owns strategic architecture decisions, and mentors a growing team of Data engineers focused on delivering reliable and scalable data systems for Machine Learning at scale. You’ll work cross-functionally with AI research scientists, infrastructure, and product teams to ensure that data—from raw audio to training-ready features—is consistently accessible, compliant, and optimized for speed and scale. You’ll help push the boundaries of real-time Voice AI! Key Responsibilities
Architect and lead the development of large-scale data pipelines and data lakes to ingest, transform and serve high-quality data for AI model training, product telemetry, and analytics. Drive long-term data infrastructure strategy across streaming and batch, feature store extensions, Iceberg/Delta lake choices, metadata management, and lakehouse evolution. Drive platform and infrastructure decisions, optimizing compute fleets (e.g., Ray, Spark clusters), orchestration tooling (Airflow, Dagster), and streaming stacks (Kafka, Flink). Collaborate with AI research scientists, engineering leads, product, finance, marketing, and legal to align data architecture with business and regulatory requirements. Advocate best practices in data governance, lineage, observability, testing, tooling, and disaster recovery across pipelines and data stores. Act as a mentor and technical leader—review design and code, share patterns, elevate team capability, and support recruitment and hiring. Drive build vs buy decisions for tools to implement data quality and observability solutions to achieve high data quality. Qualifications
10+ years of experience in Data Engineering, Infrastructure, or ML Systems, with at least 2+ years in a technical leadership capacity. Expertise in building distributed batch and real-time data systems. Expertise in databases (like PostgreSQL) and data lakes (like Snowflake, Databricks, and ClickHouse). Experience using data processing frameworks like Spark, Flink, and Ray. Deep experience with cloud platforms (AWS/GCP), object storage (e.g., S3), and orchestrators like Airflow and Dagster. Strong knowledge of data lifecycle management, including privacy, security, compliance, and reproducibility. Comfortable working in a fast-paced startup environment. Strategic mindset and proven ability to collaborate across engineering, ML, and product teams to deliver scalable infrastructure. Nice to Have
Familiarity with audio data and its challenges, such as large file sizes and time-series features, metadata handling. Experience with Voice AI models like ASR, TTS, and speaker verification. Familiarity with real-time data processing frameworks like Kafka, Flink, Druid, and Pinot. Familiarity with ML workflows including MLOps, feature engineering, model training, and inference. Experience with labeling tools, audio annotation platforms, or human-in-the-loop annotation pipelines. Joining us means contributing to the world’s first real-time speech understanding platform revolutionizing Contact Centers and Enterprises alike. Our technology empowers agents, transforms customer experiences, and drives measurable growth. You’ll be part of a team exploring the vast potential of an increasingly sonic future.
#J-18808-Ljbffr