Incedo Inc.
Databricks Data Lead
location:
San Rafael, CA - Bay Area, California (Local candidates only)
hybrid: 3 days/week onsite at client office in
San Rafael, CA
Experience Level
8-12 years in data engineering, analytics engineering, or distributed data systems
Role Overview We are seeking a
Databricks Data Lead
to support the design, implementation, and optimization of cloud-native data platforms built on the
Databricks Lakehouse Architecture . This is a hands‑on, engineering‑driven role requiring deep experience with
Apache Spark, Delta Lake, and scalable data pipeline development , combined with early-stage architectural responsibilities.
The role involves close onsite collaboration with client stakeholders, translating analytical and operational requirements into
robust, high‑performance data architectures , while adhering to best practices for data modeling, governance, reliability, and cost efficiency.
Key Responsibilities
Design, develop, and maintain
batch and near‑real‑time data pipelines
using
Databricks, PySpark, and Spark SQL
Implement
medallion (Bronze/Silver/Gold) lakehouse architectures , ensuring proper data quality, lineage, and transformation logic across layers
Build and manage
Delta Lake tables , including schema evolution, ACID transactions, time travel, and optimized data layouts
Apply performance optimization techniques such as partitioning strategies, Z‑Ordering, caching, broadcast joins, and Spark execution tuning
Support dimensional and analytical data modeling for downstream consumption by BI tools and analytics applications
Assist in defining data ingestion patterns (batch, incremental loads, CDC, and streaming where applicable)
Troubleshoot and resolve pipeline failures, data quality issues, and Spark job performance bottlenecks
Collaborate onsite with client data engineers, analysts, and business stakeholders to gather technical requirements, validate implementation approaches, and maintain technical documentation covering data flows, transformation logic, table designs, and architectural decisions
Contribute to code reviews, CI/CD practices, and version control workflows to ensure maintainable and production‑grade solutions
Required Skills & Qualifications
Strong hands‑on experience with
Databricks Lakehouse Platform
Deep working knowledge of
Apache Spark internals , including: Spark SQL
DataFrames/Datasets
Shuffle behavior and execution plans
Advanced
Python (PySpark)
and
SQL
development skills
Solid understanding of
data warehousing concepts , including: Star and snowflake schemas
Analytical vs operational workloads
Experience working with
cloud data platforms
on
AWS, Azure, or GCP
Practical experience with
Delta Lake , including: Schema enforcement and evolution
Data compaction and optimization
Proficiency with
Git‑based version control
and collaborative development workflows
Strong verbal and written communication skills for client‑facing technical discussions
Ability and willingness to work onsite 3 days/week in San Rafael, CA
Nice‑to‑Have Skills
Exposure to
Databricks Unity Catalog , data governance, and access control models
Experience with
Databricks Workflows ,
Apache Airflow , or
Azure Data Factory
for orchestration
Familiarity with streaming frameworks (Spark Structured Streaming, Kafka) and/or CDC patterns
Understanding of data quality frameworks, validation checks, and observability concepts
Experience integrating Databricks with BI tools such as Power BI, Tableau, or Looker
Awareness of cost optimization strategies in cloud‑based data platforms
Education
Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience)
Why This Role
Hands‑on ownership of Databricks Lakehouse implementations in a real‑world enterprise environment
Direct client‑facing exposure with a leading Bay Area organization
Opportunity to evolve from senior data engineering into formal data architecture responsibilities
Strong growth path toward Senior Databricks Architect / Lead Data Platform Engineer
Seniority Level
Mid‑Senior level
Employment Type
Full‑time
Job Function
Consulting
Industries
IT Services and IT Consulting
Hospitals and Health Care
Biotechnology Research
Benefits
Medical insurance
Vision insurance
401(k)
#J-18808-Ljbffr
San Rafael, CA - Bay Area, California (Local candidates only)
hybrid: 3 days/week onsite at client office in
San Rafael, CA
Experience Level
8-12 years in data engineering, analytics engineering, or distributed data systems
Role Overview We are seeking a
Databricks Data Lead
to support the design, implementation, and optimization of cloud-native data platforms built on the
Databricks Lakehouse Architecture . This is a hands‑on, engineering‑driven role requiring deep experience with
Apache Spark, Delta Lake, and scalable data pipeline development , combined with early-stage architectural responsibilities.
The role involves close onsite collaboration with client stakeholders, translating analytical and operational requirements into
robust, high‑performance data architectures , while adhering to best practices for data modeling, governance, reliability, and cost efficiency.
Key Responsibilities
Design, develop, and maintain
batch and near‑real‑time data pipelines
using
Databricks, PySpark, and Spark SQL
Implement
medallion (Bronze/Silver/Gold) lakehouse architectures , ensuring proper data quality, lineage, and transformation logic across layers
Build and manage
Delta Lake tables , including schema evolution, ACID transactions, time travel, and optimized data layouts
Apply performance optimization techniques such as partitioning strategies, Z‑Ordering, caching, broadcast joins, and Spark execution tuning
Support dimensional and analytical data modeling for downstream consumption by BI tools and analytics applications
Assist in defining data ingestion patterns (batch, incremental loads, CDC, and streaming where applicable)
Troubleshoot and resolve pipeline failures, data quality issues, and Spark job performance bottlenecks
Collaborate onsite with client data engineers, analysts, and business stakeholders to gather technical requirements, validate implementation approaches, and maintain technical documentation covering data flows, transformation logic, table designs, and architectural decisions
Contribute to code reviews, CI/CD practices, and version control workflows to ensure maintainable and production‑grade solutions
Required Skills & Qualifications
Strong hands‑on experience with
Databricks Lakehouse Platform
Deep working knowledge of
Apache Spark internals , including: Spark SQL
DataFrames/Datasets
Shuffle behavior and execution plans
Advanced
Python (PySpark)
and
SQL
development skills
Solid understanding of
data warehousing concepts , including: Star and snowflake schemas
Analytical vs operational workloads
Experience working with
cloud data platforms
on
AWS, Azure, or GCP
Practical experience with
Delta Lake , including: Schema enforcement and evolution
Data compaction and optimization
Proficiency with
Git‑based version control
and collaborative development workflows
Strong verbal and written communication skills for client‑facing technical discussions
Ability and willingness to work onsite 3 days/week in San Rafael, CA
Nice‑to‑Have Skills
Exposure to
Databricks Unity Catalog , data governance, and access control models
Experience with
Databricks Workflows ,
Apache Airflow , or
Azure Data Factory
for orchestration
Familiarity with streaming frameworks (Spark Structured Streaming, Kafka) and/or CDC patterns
Understanding of data quality frameworks, validation checks, and observability concepts
Experience integrating Databricks with BI tools such as Power BI, Tableau, or Looker
Awareness of cost optimization strategies in cloud‑based data platforms
Education
Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience)
Why This Role
Hands‑on ownership of Databricks Lakehouse implementations in a real‑world enterprise environment
Direct client‑facing exposure with a leading Bay Area organization
Opportunity to evolve from senior data engineering into formal data architecture responsibilities
Strong growth path toward Senior Databricks Architect / Lead Data Platform Engineer
Seniority Level
Mid‑Senior level
Employment Type
Full‑time
Job Function
Consulting
Industries
IT Services and IT Consulting
Hospitals and Health Care
Biotechnology Research
Benefits
Medical insurance
Vision insurance
401(k)
#J-18808-Ljbffr