Worth
Principal Data Engineer to join their innovative team. At Worth AI, we are on a mission to revolutionize decision-making with the power of artificial intelligence while fostering an environment of collaboration and adaptability, aiming to make a meaningful impact in the tech landscape. Our team values include extreme ownership, one team and creating raving fans both for our employees and customers.
Responsibilities
What you will do: Architecture & Strategy
Define end-to-end data architecture (lake/lakehouse/warehouse, batch/streaming, CDC, metadata). Set standards for schemas, contracts, orchestration, storage layers, and semantic/metrics models. Publish roadmaps, ADRs/RFCs, and north star target states; guide build vs. buy decisions.
Platform & Pipelines
Design and build scalable, observable ELT/ETL and event pipelines. Establish ingestion patterns (CDC, file, API, message bus) and schema-evolution policies. Provide self-service tooling for analysts/scientists (dbt, notebooks, catalogs, feature stores). Ensure workflow reliability (idempotency, retries, backfills, SLAs).
Data Quality & Governance
Define dataset SLAs/SLOs, freshness, lineage, and data certification tiers. Enforce contracts and validation tests; deploy anomaly detection and incident runbooks. Partner with governance on cataloging, PII handling, retention, and access policies.
Reliability, Performance & Cost
Lead capacity planning, partitioning/clustering, and query optimization. Introduce SRE-style practices for data (error budgets, postmortems). Drive FinOps for storage/compute; monitor and reduce cost per TB/query/job.
Security & Compliance
Implement encryption, tokenization, and row/column-level security; manage secrets and audits. Align with SOC 2 and privacy regulations (e.g., GDPR/CCPA; HIPAA if applicable).
ML & Analytics Enablement
Deliver versioned, documented datasets/features for BI and ML. Operationalize training/serving data flows, drift signals, and feature-store governance. Build and maintain the semantic layer and metrics consistency for experimentation/BI.
Leadership & Collaboration
Provide technical leadership across squads; mentor senior/staff engineers. Run design reviews and drive consensus on complex trade-offs. Translate business goals into data products with product/analytics leaders.
Qualifications
10+ years in data engineering (including 3+ years as staff/principal or equivalent scope). Proven leadership of company-wide data architecture and platform initiatives. Deep experience with at least one cloud (AWS) and a modern warehouse or lakehouse (e.g., Snowflake, Redshift, Databricks). Strong SQL and one programming language (Python or Scala/Java). Orchestration (Airflow/Dagster/Prefect), transformations (dbt or equivalent), and streaming (Kafka/Kinesis/PubSub). Data modeling (3NF, star, data vault) and semantic/metrics layers. Data quality testing, lineage, and observability in production environments. Security best practices: RBAC/ABAC, encryption, key management, auditability. Nice to Have
Feature stores and ML data ops; experimentation frameworks. Cost optimization at scale; multi-tenant architectures. Governance tools (DataHub/Collibra/Alation), OpenLineage, and testing frameworks (Great Expectations/Deequ). Compliance exposure (SOC 2, GDPR/CCPA; HIPAA/PCI where relevant). Model features sourced from complex 3rd-party data
(KYB/KYC, credit bureaus, fraud detection APIs) Benefits
Health Care Plan (Medical, Dental & Vision) Retirement Plan (401k, IRA) Life Insurance Unlimited Paid Time Off 9 paid Holidays Family Leave Work From Home Free Food & Snacks (Access to Industrious Co-working Membership!) Wellness Resources
#J-18808-Ljbffr
What you will do: Architecture & Strategy
Define end-to-end data architecture (lake/lakehouse/warehouse, batch/streaming, CDC, metadata). Set standards for schemas, contracts, orchestration, storage layers, and semantic/metrics models. Publish roadmaps, ADRs/RFCs, and north star target states; guide build vs. buy decisions.
Platform & Pipelines
Design and build scalable, observable ELT/ETL and event pipelines. Establish ingestion patterns (CDC, file, API, message bus) and schema-evolution policies. Provide self-service tooling for analysts/scientists (dbt, notebooks, catalogs, feature stores). Ensure workflow reliability (idempotency, retries, backfills, SLAs).
Data Quality & Governance
Define dataset SLAs/SLOs, freshness, lineage, and data certification tiers. Enforce contracts and validation tests; deploy anomaly detection and incident runbooks. Partner with governance on cataloging, PII handling, retention, and access policies.
Reliability, Performance & Cost
Lead capacity planning, partitioning/clustering, and query optimization. Introduce SRE-style practices for data (error budgets, postmortems). Drive FinOps for storage/compute; monitor and reduce cost per TB/query/job.
Security & Compliance
Implement encryption, tokenization, and row/column-level security; manage secrets and audits. Align with SOC 2 and privacy regulations (e.g., GDPR/CCPA; HIPAA if applicable).
ML & Analytics Enablement
Deliver versioned, documented datasets/features for BI and ML. Operationalize training/serving data flows, drift signals, and feature-store governance. Build and maintain the semantic layer and metrics consistency for experimentation/BI.
Leadership & Collaboration
Provide technical leadership across squads; mentor senior/staff engineers. Run design reviews and drive consensus on complex trade-offs. Translate business goals into data products with product/analytics leaders.
Qualifications
10+ years in data engineering (including 3+ years as staff/principal or equivalent scope). Proven leadership of company-wide data architecture and platform initiatives. Deep experience with at least one cloud (AWS) and a modern warehouse or lakehouse (e.g., Snowflake, Redshift, Databricks). Strong SQL and one programming language (Python or Scala/Java). Orchestration (Airflow/Dagster/Prefect), transformations (dbt or equivalent), and streaming (Kafka/Kinesis/PubSub). Data modeling (3NF, star, data vault) and semantic/metrics layers. Data quality testing, lineage, and observability in production environments. Security best practices: RBAC/ABAC, encryption, key management, auditability. Nice to Have
Feature stores and ML data ops; experimentation frameworks. Cost optimization at scale; multi-tenant architectures. Governance tools (DataHub/Collibra/Alation), OpenLineage, and testing frameworks (Great Expectations/Deequ). Compliance exposure (SOC 2, GDPR/CCPA; HIPAA/PCI where relevant). Model features sourced from complex 3rd-party data
(KYB/KYC, credit bureaus, fraud detection APIs) Benefits
Health Care Plan (Medical, Dental & Vision) Retirement Plan (401k, IRA) Life Insurance Unlimited Paid Time Off 9 paid Holidays Family Leave Work From Home Free Food & Snacks (Access to Industrious Co-working Membership!) Wellness Resources
#J-18808-Ljbffr