TCS USAAvance Consulting

Hadoop PySpark Developer

TCS USAAvance Consulting, Strongsville, Ohio, United States, 44136

Skill: Hadoop PySpark Developer Must Have Technical/Functional Skills: Cloudera Data Platform, PySpark, python ,Hive-Map Reduce, Linux /Unix, Impala , Big Data Technologies, Cloud Technologies.

Roles & Responsibilities: Understand requirements/use cases and build efficient ETL solutions using Apache Spark, python, Kafka, Hive targeting Cloudera Data Platform.

Requirement/use case analysis and convert functional requirements into concrete technical tasks and able to provide reasonable effort estimates.

Work closely with Data analyst/modeler, Business User to understand the data requirement. Convert requirements to high-level , low-level design and, source-to- target documents.

Responsible to design , develop and schedule data pipelines which handle large volume of data within SLA. Work with solution architect, Technical Managers, Admins to understand SLAs , limitations of systems and provide efficient solutions.

Expertise in processing large volume of data aggregation using spark , must know different performance improvement technique and should lead teams on optimization.

Responsible to develop efficient data ingestion and data governance framework as per specification.

Performance improvement of existing spark-based data ingestion, aggregation pipelines to meet SLA.

Work proactively, independently with global teams to address project requirements, articulate issues/challenges with enough lead time to address project delivery risks.

Plan production implementation activities , execute change requests and resolve issues in production implementation.

Plan and execute large data migration, history data rebuild activities.

Code reviews/optimization, test case reviews . Demonstrate troubleshooting skill in resolving technical issues, bugs.

Demonstrate ownership and initiative. Ability to bring-in best practices /solutions which best fit for client problem and environment.