Logo
Latcha+Associates

Data Services Engineer

Latcha+Associates, Farmington Hills, Michigan, United States

Save Job

This role will focus on building and maintaining scalable data pipelines in Azure Databricks, transforming large volumes of automotive and marketing data into governed, analytics-ready Delta tables. The ideal candidate is highly skilled in PySpark, SQL, and Azure data services, with strong attention to detail and a passion for clean, reliable data. This position plays a key role in powering our MDM platform, building and maintaining key pipelines to the CRM Application, AI initiatives, and business intelligence solutions across Latcha's enterprise data environment.

Key Responsibilities

• Design, build, and maintain scalable data pipelines in Azure Databricks to process structured and unstructured marketing and automotive data across Bronze, Silver, and Gold layers. • Develop and optimize PySpark ETL workflows for ingesting data from external vendors (Experian, OEM, Dealer Tire, Meta, Basis, etc.) using Azure Blob, Volumes, and Delta tables. • Implement robust data quality frameworks using Great Expectations and custom validation scripts to ensure data completeness, consistency, and accuracy. • Collaborate with data architects and analysts to model dealer-centric and customer-centric data for reporting, analytics, and machine learning use cases. • Automate and monitor pipeline executions via Databricks Jobs and Azure Data Factory; manage schema evolution, partitions, and performance tuning. • Contribute to development of internal Python utilities and libraries for schema alignment, transformations, and reusable ETL logic. • Work closely with the integrations and AI/ML engineering teams to operationalize gold-layer datasets for APIs, dashboards, and machine learning models.

Required Skills

• Advanced proficiency in PySpark and SQL (Databricks SQL, Delta Lake). • Strong understanding of Azure Data Ecosystem - Databricks, Data Factory, Blob Storage, Volumes, Key Vault, and Unity Catalog. • Hands-on experience building ETL pipelines using Delta architecture (Bronze Silver Gold). • Proficiency with Git, CI/CD pipelines, and version control best practices. • Ability to design efficient data models with partitioning, clustering, and schema enforcement. • Experience working with JSON, Parquet, CSV, and other structured file types. • Strong understanding of data governance, schema alignment, and error handling in distributed systems.

Nice-to-Have Skills

• Experience with Great Expectations, Soda, or similar data quality frameworks. • Familiarity with FastAPI and exposing Delta tables via REST APIs. • Knowledge of MLflow, feature stores, and model lifecycle management in Databricks. • Experience with Power BI and Fabric Mirroring for analytics layer integration. • Exposure to AI/LLM-based automation and RAG pipelines (preferred but not required). • Understanding of Delta MERGE logic, schema evolution, and optimization (Z-ordering, caching). • Experience with Azure DevOps or GitHub Actions for CI/CD automation. • Working knowledge of Docker and containerized deployments.

Experience & Qualifications

• 3-6 years of experience in data engineering or analytics engineering, ideally within a Databricks Azure environment. • Bachelor's degree in Computer Science, Information Systems, Data Engineering, or related field. • Prior experience in marketing data, CRM, or automotive datasets is highly desirable. • Strong communication skills and ability to collaborate in cross-functional teams.