Hirekeyz Inc

Azure Databricks Data Engineer -Generative AI & Advanced Lakehouse Solutions

Hirekeyz Inc, Atlanta, Georgia, United States, 30383

Role: Azure Databricks Data Engineer Generative AI & Advanced Lakehouse Solutions

Location: Atlanta GA (On-site)

Duration: 12 Months C2H

Job Summary: Our client is seeking an innovative Azure Databricks Data Engineer experienced in building scalable data lakehouse architectures and Generative AI (GenAI) solutions. The ideal candidate will design and operationalize advanced data pipelines using Azure Databricks Unity Catalog Delta Lake and Lake Flow while integrating LLM-based AI assistants and chatbots powered by Azure OpenAI.

Key Responsibilities:

(a) Data Engineering & Lakehouse Development:

Design develop and optimize data pipelines using Azure Databricks Auto Loader and Spark Structured Streaming for batch and real-time data processing.

Implement Delta Lake for unified reliable and ACID-compliant data storage.

Build Lake Flow declarative pipelines for simplified orchestration and dependency management across ingestion transformation and serving layers.

Apply Medallion Architecture (Bronze Silver Gold) principles for modular reusable data modeling.

Utilize Unity Catalog and Meta store for centralized governance lineage and fine-grained data access control across workspaces.

Develop and maintain SQL Warehouses for analytics and BI consumption.

Implement SCD Type 1 & Type 2 (Slowly Changing Dimensions) logic for historical tracking and data consistency.

Build and maintain Streaming Tables to enable continuous processing and near real-time analytics.

(b) Data Transformation & Optimization:

Design reusable and version-controlled data transformation workflows using dbt (Data Build Tool) within Databricks.

Optimize Spark jobs via adaptive query execution caching partitioning and Z-ordering for performance and cost efficiency.

Implement alerts and notification mechanisms (e.g. via Databricks Jobs Lake Flow or Azure Monitor) for proactive pipeline monitoring.

Package and deploy reusable data artifacts using Databricks Asset Bundles (DABs) to standardize deployments across environments.

(c) Generative AI & Intelligent Automation:

Develop GenAI-powered assistants and bots leveraging Azure OpenAI Lang Chain and vector databases (Azure Cognitive Search Pinecone etc.).

Integrate Retrieval-Augmented Generation (RAG) pipelines for context-aware enterprise chatbot experiences.

Enable conversational analytics and document-based Q&A using company data sources through LLM integrations.

Collaborate with ML engineers and solution architects to deploy AI features on Azure Kubernetes Service (AKS) or Azure App Service.

Required Skills & Qualifications:

Bachelors or Masters degree in Computer Science Information Systems or related field.

8 years of hands‑on experience in data engineering on Azure Databricks.

Proficiency in PySpark, SQL, Delta Lake and Lake Flow.

Deep understanding of Unity Catalog & Meta store governance, Spark Streaming & Auto Loader ingestion, ACID Transactions and Delta Lake optimization, Medallion Architecture best practices, SCD1 / SCD2 implementation, Streaming Tables and SQL Warehouse operations.

Experience with dbt (Data Build Tool) for modular data transformations.

Strong knowledge of CI / CD pipelines (Azure DevOps GitHub Actions) and environment management using Databricks Asset Bundles.

Familiarity with Generative AI frameworks (Azure OpenAI Lang Chain) and LLM integration patterns.

Preferred Skills:

Experience with Azure Synapse Power BI and Azure Data Factory (ADF).

Familiarity with Data Governance tools (e.g. Azure Purview).

Understanding of MLOps / AIOps and Lakehouse AI convergence patterns.

Knowledge of cost optimization and workload tuning in Databricks.

Soft Skills:

Excellent analytical and troubleshooting capabilities.

Strong communication and documentation skills.

Ability to work cross-functionally with AI data science and business teams.

Passion for exploring cutting-edge data & AI innovations.

Key Skills: Apache Hive, S3, Hadoop, Redshift, Spark, AWS, Apache Pig, NoSQL, Big Data, Data Warehouse, Kafka, Scala

Employment Type: Full Time

Experience: years

Vacancy: 1

#J-18808-Ljbffr