Logo
TCS USAAvance Consulting

Engineer (Py Spark ,Apache Pyspark Development)

TCS USAAvance Consulting, Jersey City, New Jersey, United States, 07390

Save Job

Strong hands-on experience with Apache Spark and PySpark for large-scale data processing.

Proficiency in Python programming with focus on writing optimized, modular, and reusable code.

Solid understanding of ETL processes, data pipeline design, and data integration techniques.

Experience in performance tuning and optimization of Spark jobs in distributed computing environments.

Good knowledge of SQL and working with relational as well as NoSQL databases.

Familiarity with big data ecosystems (e.g., Hadoop, Hive, HDFS) and data warehouses (e.g., Snowflake, Redshift).

Understanding of data quality, validation, and error-handling frameworks.

Exposure to cloud platforms (AWS, Azure, or GCP) and their data services is a plus.

Strong problem-solving skills and ability to troubleshoot production issues.

Good communication and collaboration skills to work with cross-functional teams. Roles & Responsibilities

Data Pipeline Development: Design, develop, and maintain scalable data pipelines using PySpark for ETL processes, integrating multiple data sources, transforming datasets, and loading into target systems.

Performance Optimization: Optimize PySpark applications and Spark jobs for efficiency, fine-tuning configurations, and ensuring high performance with large-scale datasets.

Data Quality and Integrity: Implement validation rules, error-handling mechanisms, and monitoring frameworks to ensure accuracy, consistency, and integrity of data throughout its lifecycle.

Collaboration: Partner with data engineers, data scientists, and business analysts to translate business requirements into technical solutions that support data-driven decision-making.

Code Development and Maintenance: Write clean, efficient, and well-documented PySpark code, follow coding standards, and actively participate in peer code reviews.

Troubleshooting and Support: Monitor Spark jobs, identify and resolve issues, and provide production support for PySpark-based applications.