Sanas

Principal Data Engineer

Sanas, Palo Alto, California, United States, 94306

Overview

Sanas is hiring for a Principal Data Engineer to lead the design and implementation of our end-to-end data infrastructure for industry-leading Voice AI products. This high-impact role shapes the technical vision, owns architecture decisions, and mentors a growing team of Data Engineers focused on delivering reliable and scalable data systems for machine learning at scale. Join a fast-moving environment where data from raw audio to training-ready features is accessible, compliant, and optimized for speed and scale. Sanas is a 200-strong team, established in 2020, with a track record of rapid growth and collaboration with major customers and investors. This description reflects the responsibilities and requirements for the Principal Data Engineer role. Key Responsibilities Architect and lead the development of large-scale data pipelines and data lakes to ingest, transform and serve high-quality data for AI model training, product telemetry and analytics. Drive long-term data infrastructure strategy across streaming and batch, feature store extensions, Iceberg/Delta lake choices, metadata management, and lakehouse evolution. Drive platform and infrastructure decisions, optimizing compute fleets (e.g., Ray, Spark clusters), orchestration tooling (Airflow, Dagster), and streaming stacks (Kafka, Flink). Collaborate with AI research scientists, engineering leads, product, finance, marketing, and legal to align data architecture with business and regulatory requirements. Advocate best practices in data governance, lineage, observability, testing, tooling, and disaster recovery across pipelines and data stores. Act as a mentor and technical leader review design and code, share patterns, elevate team capability, and support recruitment and hiring. Drive build vs buy decisions for tools to implement data quality and observability solutions to achieve high data quality.

Qualifications

10+ years of experience in Data Engineering, Infrastructure, or ML Systems, with at least 2+ years in a technical leadership capacity. Expertise in building distributed batch and real-time data systems. Expertise in databases (like Postgres) and Data Lakes (like Snowflake, Databricks and ClickHouse). Experience using Data Processing frameworks like Spark, Flink and Ray. Deep Experience with cloud platforms AWS/GCP, object storage (e.g., S3), and orchestrators like Airflow and Dagster. Strong knowledge of data lifecycle management, including privacy, security, compliance and reproducibility. Comfortable working in a fast-paced startup environment. Strategic mindset and proven ability to collaborate across engineering, ML and product teams to deliver infrastructure that scales with the business.

Nice To Have

Familiarity with audio data and its unique challenges, like large file sizes, time-series features, metadata handling, is a strong plus. Experience with Voice AI models like ASR, TTS and speaker verification. Familiarity with real-time data processing frameworks like Kafka, Flink, Druid and Pinot. Familiarity with ML workflows including MLOps, feature engineering, model training and inference. Experience with labeling tools, audio annotation platforms, or human-in-the-loop annotation pipelines.

The Pay Range For This Role Is

250,000 - 350,000 USD per year (Palo Alto Office) Employment details

Seniority level: Mid-Senior level Employment type: Full-time Job function: Information Technology Industry: Software Development

We are an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or veteran status. Referrals increase your chances of interviewing at Sanas. #J-18808-Ljbffr