Junior PySpark Engineer - AWS/EMR
Saviance - Boston, Massachusetts, us, 02298
Work at Saviance
Overview
- View job
Overview
Location: Remote (EST Time Zone Preferred)- 5 Days a month in the Office
Duration: 6 Months Contract
About BigRio:
BigRio is a remote-based, technology consulting firm with headquarters in Boston, MA. We deliver software solutions ranging from custom development and software implementation to data analytics and machine learning/AI integrations. As a one-stop shop, we attract clients from a variety of industries due to our proven ability to deliver cutting-edge, cost-effective software solutions.
Job Overview:
We are seeking a Junior
PySpark Engineer
with strong hands-on experience in building distributed data pipelines using
Apache Spark on AWS EMR . The ideal candidate is proficient in
Python , has worked with
Databricks , and has a solid understanding of
GxP-compliant
environments. This is a
coding-heavy role
- not DevOps or AWS administration - where you'll contribute directly to the architecture and development of robust data solutions in a highly regulated, cloud-native environment.
Key Responsibilities: Design, develop, and maintain distributed ETL data pipelines using PySpark on AWS EMR Work within a GxP-compliant environment, ensuring data integrity and regulatory alignment Write clean, scalable, and efficient PySpark code for large-scale data processing Utilize AWS cloud services for pipeline orchestration, compute, and storage Collaborate closely with cross-functional teams to deliver end-to-end data solutions Participate in code reviews, testing, and deployment of pipeline components Ensure performance optimization, fault tolerance, and scalability of data workflows Required Qualifications:
2-4 years of experience in software or data engineering with a focus on distributed systems Deep hands-on experience with
Apache Spark ,
PySpark , and
AWS (especially EMR) Experience building pipelines using
Databricks required. Strong programming skills in
Python Solid understanding of
cloud-native
architectures Familiarity with
GxP compliance
and working in regulated data environments Proven ability to independently design and develop data pipelines (not a DevOps/AWS admin role) Experience with distributed computing and high-volume ETL pipelines