Sigmaways Inc
Senior Data Engineer - Spark, Airflow
Sigmaways Inc, San Francisco, California, United States, 94199
Get AI-powered advice on this job and more exclusive features.
We are seeking an experienced
Data Engineer
to design and optimize scalable data pipelines that drive our global data and analytics initiatives. In this role, you will leverage technologies such as
Apache Spark ,
Airflow , and
Python
to build high performance data processing systems and ensure data quality, reliability, and lineage across Mastercard’s data ecosystem. The ideal candidate combines strong technical expertise with hands‑on experience in distributed data systems, workflow automation, and performance tuning to deliver impactful, data‑driven solutions at enterprise scale. Responsibilities: Design and optimize Spark-based ETL pipelines for large‑scale data processing. Build and manage Airflow DAGs for scheduling, orchestration, and checkpointing. Implement partitioning and shuffling strategies to improve Spark performance. Ensure data lineage, quality, and traceability across systems. Develop Python scripts for data transformation, aggregation, and validation. Execute and tune Spark jobs using spark-submit. Perform DataFrame joins and aggregations for analytical insights. Automate multi‑step processes through shell scripting and variable management. Collaborate with data, DevOps, and analytics teams to deliver scalable data solutions. Qualifications: Bachelor’s degree in Computer Science, Data Engineering, or related field (or equivalent experience). At least 7 years of experience in data engineering or big data development. Strong expertise in Apache Spark architecture, optimization, and job configuration. Proven experience with Airflow DAGs using authoring, scheduling, checkpointing, monitoring. Skilled in data shuffling, partitioning strategies, and performance tuning in distributed systems. Expertise in Python programming including data structures and algorithmic problem‑solving. Hands‑on with Spark DataFrames and PySpark transformations using joins, aggregations, filters. Proficient in shell scripting, including managing and passing variables between scripts. Experienced with
spark submit
for deployment and tuning. Solid understanding of ETL design, workflow automation, and distributed data systems. Excellent debugging and problem‑solving skills in large‑scale environments. Experience with AWS Glue, EMR, Databricks, or similar Spark platforms. Knowledge of data lineage and data quality frameworks like Apache Atlas. Familiarity with CI/CD pipelines, Docker/Kubernetes, and data governance tools. Seniority level
Mid‑Senior level Employment type
Contract Job function
Information Technology Banking Referrals increase your chances of interviewing at Sigmaways Inc by 2x
#J-18808-Ljbffr
Data Engineer
to design and optimize scalable data pipelines that drive our global data and analytics initiatives. In this role, you will leverage technologies such as
Apache Spark ,
Airflow , and
Python
to build high performance data processing systems and ensure data quality, reliability, and lineage across Mastercard’s data ecosystem. The ideal candidate combines strong technical expertise with hands‑on experience in distributed data systems, workflow automation, and performance tuning to deliver impactful, data‑driven solutions at enterprise scale. Responsibilities: Design and optimize Spark-based ETL pipelines for large‑scale data processing. Build and manage Airflow DAGs for scheduling, orchestration, and checkpointing. Implement partitioning and shuffling strategies to improve Spark performance. Ensure data lineage, quality, and traceability across systems. Develop Python scripts for data transformation, aggregation, and validation. Execute and tune Spark jobs using spark-submit. Perform DataFrame joins and aggregations for analytical insights. Automate multi‑step processes through shell scripting and variable management. Collaborate with data, DevOps, and analytics teams to deliver scalable data solutions. Qualifications: Bachelor’s degree in Computer Science, Data Engineering, or related field (or equivalent experience). At least 7 years of experience in data engineering or big data development. Strong expertise in Apache Spark architecture, optimization, and job configuration. Proven experience with Airflow DAGs using authoring, scheduling, checkpointing, monitoring. Skilled in data shuffling, partitioning strategies, and performance tuning in distributed systems. Expertise in Python programming including data structures and algorithmic problem‑solving. Hands‑on with Spark DataFrames and PySpark transformations using joins, aggregations, filters. Proficient in shell scripting, including managing and passing variables between scripts. Experienced with
spark submit
for deployment and tuning. Solid understanding of ETL design, workflow automation, and distributed data systems. Excellent debugging and problem‑solving skills in large‑scale environments. Experience with AWS Glue, EMR, Databricks, or similar Spark platforms. Knowledge of data lineage and data quality frameworks like Apache Atlas. Familiarity with CI/CD pipelines, Docker/Kubernetes, and data governance tools. Seniority level
Mid‑Senior level Employment type
Contract Job function
Information Technology Banking Referrals increase your chances of interviewing at Sigmaways Inc by 2x
#J-18808-Ljbffr