Logo
Mastech Digital

Lead Data Engineer - W2

Mastech Digital, Strongsville, Ohio, United States, 44136

Save Job

Overview

We are seeking an experienced Data Engineer to support a large-scale data platform migration from Cloudera Hadoop to Apache Iceberg. This is a critical role in our data modernisation initiative, where you’ll help redesign legacy data lake structures and ETL pipelines for modern, cloud-native architecture. Responsibilities

Lead the migration of datasets and ETL workflows from Cloudera Hadoop (Hive, Impala, HDFS, etc.) to an Apache Iceberg-based architecture. Analyse existing data pipelines and storage formats (e.g., Parquet, ORC) to plan and execute a smooth migration strategy. Design and implement scalable data ingestion and transformation pipelines using Apache Spark, Flink, or equivalent tools. Optimise data partitioning, schema evolution, compaction, and metadata management using Iceberg best practices. Integrate Iceberg tables with query engines like Trino or Presto to support data analytics use cases. Ensure compatibility and data quality during the migration phase through robust testing, validation, and lineage tracking. Establish monitoring, logging, and performance tuning for migrated pipelines and Iceberg tables. Document architecture decisions and provide technical guidance to internal teams throughout the migration process. Optimise data ingestion and feature retrieval pipelines for real-time and batch processing. Qualifications

Must Have: Python, SQL, Hadoop, Apache Spark, Kafka, AWS Experience with Apache Iceberg, Hadoop ecosystems, and distributed data processing Seniorities & Employment

Seniority level: Mid-Senior level Employment type: Full-time Job function: Information Technology Industries: IT Services and IT Consulting Location: Cleveland, OH Compensation: $77,000 - $202,000

#J-18808-Ljbffr