Capgemini USAAvance Consulting

Python Developer

Capgemini USAAvance Consulting, Hanover Twp

Key Responsibilities
Design, develop, and maintain scalable ETL pipelines using PySpark and Python to process large-scale datasets across distributed environments.
Implement complex data transformation logic and optimize Spark jobs for performance and cost-efficiency.
Collaborate with data scientists, analysts, and business stakeholders to understand data requirements and deliver robust solutions.
Integrate data from multiple sources (structured, semi-structured, unstructured) into unified data models.
Ensure data quality, consistency, and governance through validation, monitoring, and error handling mechanisms.
Participate in system design discussions and contribute to architectural decisions for big data platforms.
Document technical solutions and maintain version control using Git and CI/CD tools like Jenkins or Azure DevOps.

Required Skills
* Strong programming skills in Python and PySpark.
* Hands-on experience with Apache Spark, Hadoop, Hive, and HDFS.
* Proficiency in SQL and working with both relational and NoSQL databases.
* Strong debugging, performance tuning, and problem-solving skills.