Latcha+Associates
This role will focus on building and maintaining scalable data pipelines in Azure Databricks, transforming large volumes of automotive and marketing data into governed, analytics-ready Delta tables. The ideal candidate is highly skilled in PySpark, SQL, and Azure data services, with strong attention to detail and a passion for clean, reliable data. This position plays a key role in powering our MDM platform, building and maintaining key pipelines to the CRM Application, AI initiatives, and business intelligence solutions across Latcha's enterprise data environment.
Key Responsibilities
• Design, build, and maintain scalable data pipelines in Azure Databricks to process structured and unstructured marketing and automotive data across Bronze, Silver, and Gold layers. • Develop and optimize PySpark ETL workflows for ingesting data from external vendors (Experian, OEM, Dealer Tire, Meta, Basis, etc.) using Azure Blob, Volumes, and Delta tables. • Implement robust data quality frameworks using Great Expectations and custom validation scripts to ensure data completeness, consistency, and accuracy. • Collaborate with data architects and analysts to model dealer-centric and customer-centric data for reporting, analytics, and machine learning use cases. • Automate and monitor pipeline executions via Databricks Jobs and Azure Data Factory; manage schema evolution, partitions, and performance tuning. • Contribute to development of internal Python utilities and libraries for schema alignment, transformations, and reusable ETL logic. • Work closely with the integrations and AI/ML engineering teams to operationalize gold-layer datasets for APIs, dashboards, and machine learning models.
Required Skills
• Advanced proficiency in PySpark and SQL (Databricks SQL, Delta Lake). • Strong understanding of Azure Data Ecosystem - Databricks, Data Factory, Blob Storage, Volumes, Key Vault, and Unity Catalog. • Hands-on experience building ETL pipelines using Delta architecture (Bronze Silver Gold). • Proficiency with Git, CI/CD pipelines, and version control best practices. • Ability to design efficient data models with partitioning, clustering, and schema enforcement. • Experience working with JSON, Parquet, CSV, and other structured file types. • Strong understanding of data governance, schema alignment, and error handling in distributed systems.
Nice-to-Have Skills
• Experience with Great Expectations, Soda, or similar data quality frameworks. • Familiarity with FastAPI and exposing Delta tables via REST APIs. • Knowledge of MLflow, feature stores, and model lifecycle management in Databricks. • Experience with Power BI and Fabric Mirroring for analytics layer integration. • Exposure to AI/LLM-based automation and RAG pipelines (preferred but not required). • Understanding of Delta MERGE logic, schema evolution, and optimization (Z-ordering, caching). • Experience with Azure DevOps or GitHub Actions for CI/CD automation. • Working knowledge of Docker and containerized deployments.
Experience & Qualifications
• 3-6 years of experience in data engineering or analytics engineering, ideally within a Databricks Azure environment. • Bachelor's degree in Computer Science, Information Systems, Data Engineering, or related field. • Prior experience in marketing data, CRM, or automotive datasets is highly desirable. • Strong communication skills and ability to collaborate in cross-functional teams.
Key Responsibilities
• Design, build, and maintain scalable data pipelines in Azure Databricks to process structured and unstructured marketing and automotive data across Bronze, Silver, and Gold layers. • Develop and optimize PySpark ETL workflows for ingesting data from external vendors (Experian, OEM, Dealer Tire, Meta, Basis, etc.) using Azure Blob, Volumes, and Delta tables. • Implement robust data quality frameworks using Great Expectations and custom validation scripts to ensure data completeness, consistency, and accuracy. • Collaborate with data architects and analysts to model dealer-centric and customer-centric data for reporting, analytics, and machine learning use cases. • Automate and monitor pipeline executions via Databricks Jobs and Azure Data Factory; manage schema evolution, partitions, and performance tuning. • Contribute to development of internal Python utilities and libraries for schema alignment, transformations, and reusable ETL logic. • Work closely with the integrations and AI/ML engineering teams to operationalize gold-layer datasets for APIs, dashboards, and machine learning models.
Required Skills
• Advanced proficiency in PySpark and SQL (Databricks SQL, Delta Lake). • Strong understanding of Azure Data Ecosystem - Databricks, Data Factory, Blob Storage, Volumes, Key Vault, and Unity Catalog. • Hands-on experience building ETL pipelines using Delta architecture (Bronze Silver Gold). • Proficiency with Git, CI/CD pipelines, and version control best practices. • Ability to design efficient data models with partitioning, clustering, and schema enforcement. • Experience working with JSON, Parquet, CSV, and other structured file types. • Strong understanding of data governance, schema alignment, and error handling in distributed systems.
Nice-to-Have Skills
• Experience with Great Expectations, Soda, or similar data quality frameworks. • Familiarity with FastAPI and exposing Delta tables via REST APIs. • Knowledge of MLflow, feature stores, and model lifecycle management in Databricks. • Experience with Power BI and Fabric Mirroring for analytics layer integration. • Exposure to AI/LLM-based automation and RAG pipelines (preferred but not required). • Understanding of Delta MERGE logic, schema evolution, and optimization (Z-ordering, caching). • Experience with Azure DevOps or GitHub Actions for CI/CD automation. • Working knowledge of Docker and containerized deployments.
Experience & Qualifications
• 3-6 years of experience in data engineering or analytics engineering, ideally within a Databricks Azure environment. • Bachelor's degree in Computer Science, Information Systems, Data Engineering, or related field. • Prior experience in marketing data, CRM, or automotive datasets is highly desirable. • Strong communication skills and ability to collaborate in cross-functional teams.