Mega Cloud Lab

AWS Databricks Architect

Mega Cloud Lab, Santa Clara

Overview

We're seeking a visionary Data Architect with deep expertise in Databricks to lead the design, implementation, and optimization of our enterprise data architecture. You’ll be instrumental in shaping scalable data solutions that empower analytics, AI, and business intelligence across the organization. If you thrive in a fast-paced environment, love solving complex data challenges, and have a passion for cloud-native platforms like AWS Databricks , we want to hear from you.

Responsibilities

Design & Architecture: Architect scalable, secure Lakehouse and data platform solutions using Databricks, Spark, Delta Lake, and cloud storage.
Implementation & Development: Lead implementation of ETL/ELT pipelines (batch & real-time), Databricks notebooks, jobs, Structured Streaming, and PySpark/Scala code for production workloads.
Data Modeling & Pipelines: Define canonical data models, schema evolution strategies, and optimized ingestion patterns to support analytics, BI, and ML use cases.
Performance & Cost Optimization: Tune Spark jobs, Delta tables, cluster sizing, and storage to balance performance, latency, and cost-efficiency.
Governance & Compliance: Implement data quality, lineage, access controls, and compliance measures (Unity Catalog, RBAC) to meet internal and regulatory standards.
Migration & Modernization: Lead migration from legacy warehouses to cloud platforms and modern data architectures (Databricks, Snowflake), ensuring minimal disruption.
DevOps & Automation: Define CI/CD and Infrastructure as Code best practices for data platform deployments (Terraform, CI pipelines, Databricks Jobs API).
Leadership & Mentorship: Mentor engineers, drive architecture reviews, and collaborate with stakeholders to translate business needs into technical solutions.

Required Skills & Qualifications

12+ years in data architecture with 5+ years hands-on experience in Databricks or equivalent Lakehouse platforms.
Strong experience with Snowflake and modern data warehousing patterns.
Cloud & Platform: Proven experience on AWS (preferably with AWS Databricks) — familiarity with S3, IAM, Glue, and networking for secure deployments.
Core Technologies: Deep proficiency in Apache Spark, Delta Lake, PySpark/Scala, SQL, and performance tuning.
Data Engineering: Experience designing ETL/ELT pipelines, data modeling, partitioning strategies, and data quality frameworks.
Automation & DevOps: Familiarity with CI/CD, Terraform (or other IaC), Databricks Jobs API, and pipeline orchestration tools (Airflow, Prefect, dbt).
Governance & Security: Knowledge of data governance, metadata management, lineage, RBAC, and regulatory compliance best practices.
Communication: Excellent stakeholder management, documentation, and cross-functional collaboration skills.

Preferred Qualifications

Databricks Certified Data Engineer or Architect.
Experience with MLflow, Unity Catalog, and Lakehouse architecture.
Background in machine learning, AI, or advanced analytics.
Experience with tools like Apache Airflow, dbt, or Power BI/Tableau.

Seniority level

Mid-Senior level

Employment type

Contract

Job function

Engineering and Information Technology

Industries

IT Services and IT Consulting

#J-18808-Ljbffr