Logo
Saviance

Junior PySpark Engineer - AWS/EMR

Saviance, Boston, Massachusetts, us, 02298

Save Job

Job Title: Junior PySpark Engineer - AWS/EMR

Location: Remote (EST Time Zone Preferred)- 5 Days a month in the Office

Duration: 6 Months Contract

About BigRio:

BigRio is a remote-based, technology consulting firm with headquarters in Boston, MA. We deliver software solutions ranging from custom development and software implementation to data analytics and machine learning/AI integrations. As a one-stop shop, we attract clients from a variety of industries due to our proven ability to deliver cutting-edge, cost-effective software solutions.

Job Overview:

We are seeking a Junior

PySpark Engineer

with strong hands-on experience in building distributed data pipelines using

Apache Spark on AWS EMR . The ideal candidate is proficient in

Python , has worked with

Databricks , and has a solid understanding of

GxP-compliant

environments. This is a

coding-heavy role

- not DevOps or AWS administration - where you'll contribute directly to the architecture and development of robust data solutions in a highly regulated, cloud-native environment.

Key Responsibilities: Design, develop, and maintain distributed ETL data pipelines using PySpark on AWS EMR Work within a GxP-compliant environment, ensuring data integrity and regulatory alignment Write clean, scalable, and efficient PySpark code for large-scale data processing Utilize AWS cloud services for pipeline orchestration, compute, and storage Collaborate closely with cross-functional teams to deliver end-to-end data solutions Participate in code reviews, testing, and deployment of pipeline components Ensure performance optimization, fault tolerance, and scalability of data workflows Required Qualifications:

2-4 years of experience in software or data engineering with a focus on distributed systems Deep hands-on experience with

Apache Spark ,

PySpark , and

AWS (especially EMR) Experience building pipelines using

Databricks required. Strong programming skills in

Python Solid understanding of

cloud-native

architectures Familiarity with

GxP compliance

and working in regulated data environments Proven ability to independently design and develop data pipelines (not a DevOps/AWS admin role) Experience with distributed computing and high-volume ETL pipelines