Hadoop PySpark Developer
TCS USAAvance Consulting - Strongsville, Ohio, United States, 44136
Work at TCS USAAvance Consulting
Overview
- View job
Overview
Roles & Responsibilities: Understand requirements/use cases and build efficient ETL solutions using Apache Spark, python, Kafka, Hive targeting Cloudera Data Platform.
Requirement/use case analysis and convert functional requirements into concrete technical tasks and able to provide reasonable effort estimates.
Work closely with Data analyst/modeler, Business User to understand the data requirement. Convert requirements to high-level , low-level design and, source-to- target documents.
Responsible to design , develop and schedule data pipelines which handle large volume of data within SLA. Work with solution architect, Technical Managers, Admins to understand SLAs , limitations of systems and provide efficient solutions.
Expertise in processing large volume of data aggregation using spark , must know different performance improvement technique and should lead teams on optimization.
Responsible to develop efficient data ingestion and data governance framework as per specification.
Performance improvement of existing spark-based data ingestion, aggregation pipelines to meet SLA.
Work proactively, independently with global teams to address project requirements, articulate issues/challenges with enough lead time to address project delivery risks.
Plan production implementation activities , execute change requests and resolve issues in production implementation.
Plan and execute large data migration, history data rebuild activities.
Code reviews/optimization, test case reviews . Demonstrate troubleshooting skill in resolving technical issues, bugs.
Demonstrate ownership and initiative. Ability to bring-in best practices /solutions which best fit for client problem and environment.