Compugra Systems
DATA ARCHITECT
Location : California Bay Area
1 Year
Role expectations
The Data Architect will be responsible for designing, implementing, and maintaining scalable data architectures on the Databricks platform with a strong understanding of SAP data structures, especially master data. The role requires hands‑on experience in data engineering, governance, and platform administration, as well as the ability to guide development teams through best practices, architecture decisions, and code reviews.
Skills Technical Skills
8 15 years in data engineering/architecture, with 3 5 years specifically in
Databricks .
Deep knowledge of:
PySpark , Spark SQL, Delta Lake
Unity Catalog , cluster management, lakehouse governance
Azure/AWS/Google Cloud Platform cloud architecture
Strong experience with
SAP data :
Extracting data from ECC/S4/BW
Understanding of SAP tables, master data structures, and business logic
Experience with IDOCs, BAPIs, ODP/ODQ sources
Strong MDM experience:
Master data modelling
Data quality frameworks
Metadata management
Golden record management
CI/CD: Git, Azure DevOps, GitHub Actions or similar.
Databricks Workflows / Jobs orchestration.
Exposure to planning systems such as
SAP IBP/APO
(preferred but not required).
Soft Skills
Strong communication and documentation skills.
Ability to interact with business and technical teams.
Problem‑solving with a focus on performance, reliability, and scalability.
Leadership mindset with ability to guide and upskill teams.
Detailed skills Architecture & Solution Design
Design end‑to‑end data architectures leveraging
Databricks Lakehouse Platform
(Delta Lake, Unity Catalog, Lakehouse Governance).
Develop scalable ingestion, transformation, and consumption patterns for SAP data (ECC/S4, BW, IBP, APO, etc.).
Define data models for
Master Data Management (MDM)
Material, Customer, Vendor, BOM, Plant, Cost Center, Profit Center, etc.
Create logical/physical models aligned with business processes (planning, procurement, manufacturing, finance).
Databricks Platform Administration
Manage
workspace configuration , clusters, secrets, networking, and access control.
Set up and maintain
Unity Catalog , catalogs, schemas, storage credentialing, and data lineage.
Develop CI/CD frameworks for Databricks repos, workflows, and environment promotions.
Monitor platform performance, optimize cluster sizing, and implement cost‑control measures.
Infrastructure & Environment Setup
Design and configure environments (Dev/Test/Prod) across Azure/AWS/Google Cloud Platform Databricks.
Set up pipelines for SAP data ingestion using
ADF, Synapse, Data Factory, AWS Glue, SAP connectors, ODP/ODQ, RFC/IDOC/BAPI mechanisms .
Architect secure storage layers (Bronze/Silver/Gold) with Delta Lake best practices.
Ensure integration with enterprise security standards Key Vaults, ADLS/S3, IAM, networking.
Data Governance & MDM
Implement governance frameworks around
data quality, lineage, cataloging, and stewardship .
Define master data validations, deduplication logic, survivorship rules, and versioning.
Implement data quality rules using
Delta Live Tables (DLT), expectations, and audits .
Collaborate with business teams to define golden records and standardized master data models.
Best Practices, Standards & Reviews
Create coding standards for PySpark, SQL, Delta Lake, and ETL/ELT pipelines.
Review developer code with focus on: Query optimization Efficient Delta Lake operations (MERGE, OPTIMIZE, ZORDER) Cluster cost optimization Error handling and logging patterns Define reusable frameworks for ingestion, transformation, and reconciliation.
Development Guidance & Team Enablement
Mentor developers on Databricks architecture, PySpark patterns, and SAP data structures.
Provide technical leadership in design sessions and sprint planning.
Conduct knowledge sessions on best practices and common pitfalls.
Troubleshoot complex data pipeline issues across SAP Databricks
#J-18808-Ljbffr
Skills Technical Skills
8 15 years in data engineering/architecture, with 3 5 years specifically in
Databricks .
Deep knowledge of:
PySpark , Spark SQL, Delta Lake
Unity Catalog , cluster management, lakehouse governance
Azure/AWS/Google Cloud Platform cloud architecture
Strong experience with
SAP data :
Extracting data from ECC/S4/BW
Understanding of SAP tables, master data structures, and business logic
Experience with IDOCs, BAPIs, ODP/ODQ sources
Strong MDM experience:
Master data modelling
Data quality frameworks
Metadata management
Golden record management
CI/CD: Git, Azure DevOps, GitHub Actions or similar.
Databricks Workflows / Jobs orchestration.
Exposure to planning systems such as
SAP IBP/APO
(preferred but not required).
Soft Skills
Strong communication and documentation skills.
Ability to interact with business and technical teams.
Problem‑solving with a focus on performance, reliability, and scalability.
Leadership mindset with ability to guide and upskill teams.
Detailed skills Architecture & Solution Design
Design end‑to‑end data architectures leveraging
Databricks Lakehouse Platform
(Delta Lake, Unity Catalog, Lakehouse Governance).
Develop scalable ingestion, transformation, and consumption patterns for SAP data (ECC/S4, BW, IBP, APO, etc.).
Define data models for
Master Data Management (MDM)
Material, Customer, Vendor, BOM, Plant, Cost Center, Profit Center, etc.
Create logical/physical models aligned with business processes (planning, procurement, manufacturing, finance).
Databricks Platform Administration
Manage
workspace configuration , clusters, secrets, networking, and access control.
Set up and maintain
Unity Catalog , catalogs, schemas, storage credentialing, and data lineage.
Develop CI/CD frameworks for Databricks repos, workflows, and environment promotions.
Monitor platform performance, optimize cluster sizing, and implement cost‑control measures.
Infrastructure & Environment Setup
Design and configure environments (Dev/Test/Prod) across Azure/AWS/Google Cloud Platform Databricks.
Set up pipelines for SAP data ingestion using
ADF, Synapse, Data Factory, AWS Glue, SAP connectors, ODP/ODQ, RFC/IDOC/BAPI mechanisms .
Architect secure storage layers (Bronze/Silver/Gold) with Delta Lake best practices.
Ensure integration with enterprise security standards Key Vaults, ADLS/S3, IAM, networking.
Data Governance & MDM
Implement governance frameworks around
data quality, lineage, cataloging, and stewardship .
Define master data validations, deduplication logic, survivorship rules, and versioning.
Implement data quality rules using
Delta Live Tables (DLT), expectations, and audits .
Collaborate with business teams to define golden records and standardized master data models.
Best Practices, Standards & Reviews
Create coding standards for PySpark, SQL, Delta Lake, and ETL/ELT pipelines.
Review developer code with focus on: Query optimization Efficient Delta Lake operations (MERGE, OPTIMIZE, ZORDER) Cluster cost optimization Error handling and logging patterns Define reusable frameworks for ingestion, transformation, and reconciliation.
Development Guidance & Team Enablement
Mentor developers on Databricks architecture, PySpark patterns, and SAP data structures.
Provide technical leadership in design sessions and sprint planning.
Conduct knowledge sessions on best practices and common pitfalls.
Troubleshoot complex data pipeline issues across SAP Databricks
#J-18808-Ljbffr