Big Resourcing
Job Title: Senior PySpark Engineer – AWS/EMR and Junior PySpark Engineer – AWS/EMR (Multiple openings)
Location: Remote (EST Time Zone Preferred)- 5 Days a month in the Office
Duration: 6 Months Contract
About BigRio BigRio is a remote-based, technology consulting firm with headquarters in Boston, MA. We deliver software solutions ranging from custom development and software implementation to data analytics and machine learning/AI integrations. As a one-stop shop, we attract clients from a variety of industries due to our proven ability to deliver cutting‑edge, cost‑effective software solutions.
Job Overview We are seeking both
Junior
and
Senior PySpark Engineers
with strong hands‑on experience in building distributed data pipelines using
Apache Spark on AWS EMR . The ideal candidate is proficient in
Python , has worked with
Databricks , and has a solid understanding of
GxP‑compliant
environments. This is ab>coding‑heavy role — not DevOps or AWS administration — where you’ll contribute directly to the architecture and development of robust data solutions in a highly regulated, cloud‑native environment.
Key Responsibilities
Design, develop, and maintain distributed ETL data pipelines using PySpark on AWS EMR
Work within a GxP‑compliant environment, ensuring data integrity and regulatory alignment
Write clean, scalable, and efficient PySpark code for large‑scale data processing
Utilize AWS cloud services for pipeline orchestration, compute, and storage
Collaborate closely with cross‑functional teams to deliver end‑to‑end data solutions
Participate in code reviews, testing, and deployment of pipeline components
Ensure performance optimization, fault tolerance, and scalability of data workflows
8–10 years of experience in software or data engineering with a focus on distributed systems for senior
2-4 years of experience in software or data engineering with a focus on distributed systems for Junior
Deep hands‑on experience with
Apache Spark ,
PySpark , and
AWS (especially EMR)
Experience building pipelines using
Databricks
is required.
Strong programming skills in
Python
Solid understanding of
cloud‑native
architectures
Familiarity with
GxP compliance
and working in regulated data environments
Proven ability to independently design and develop data pipelines (not a DevOps/AWS admin role)
Experience with distributed computing and high‑volume ETL pipelines
Equal Opportunity Statement BigRio is an equal‑opportunity employer. We prohibit discrimination and harassment of any kind based on race, religion, national origin, sex, sexual orientation, gender identity, age, pregnancy, status as a qualified individual with disability, protected veteran status, or other protected characteristic as outlined by federal, state, or local laws. BigRio makes hiring decisions based solely on qualifications, merit, and business needs at the time. All qualified applicants will receive equal consideration for employment.
BigRio is a leading AI, Gen AI, Data and Analytics professional services company. We are focused on Healthcare, Pharma, Digital Health, Provider, and Payer Industry segments with several innovative solutions.
Harvard Square, One Mifflin Place Suite 400 Cambridge, MA 02138
#J-18808-Ljbffr
Location: Remote (EST Time Zone Preferred)- 5 Days a month in the Office
Duration: 6 Months Contract
About BigRio BigRio is a remote-based, technology consulting firm with headquarters in Boston, MA. We deliver software solutions ranging from custom development and software implementation to data analytics and machine learning/AI integrations. As a one-stop shop, we attract clients from a variety of industries due to our proven ability to deliver cutting‑edge, cost‑effective software solutions.
Job Overview We are seeking both
Junior
and
Senior PySpark Engineers
with strong hands‑on experience in building distributed data pipelines using
Apache Spark on AWS EMR . The ideal candidate is proficient in
Python , has worked with
Databricks , and has a solid understanding of
GxP‑compliant
environments. This is ab>coding‑heavy role — not DevOps or AWS administration — where you’ll contribute directly to the architecture and development of robust data solutions in a highly regulated, cloud‑native environment.
Key Responsibilities
Design, develop, and maintain distributed ETL data pipelines using PySpark on AWS EMR
Work within a GxP‑compliant environment, ensuring data integrity and regulatory alignment
Write clean, scalable, and efficient PySpark code for large‑scale data processing
Utilize AWS cloud services for pipeline orchestration, compute, and storage
Collaborate closely with cross‑functional teams to deliver end‑to‑end data solutions
Participate in code reviews, testing, and deployment of pipeline components
Ensure performance optimization, fault tolerance, and scalability of data workflows
8–10 years of experience in software or data engineering with a focus on distributed systems for senior
2-4 years of experience in software or data engineering with a focus on distributed systems for Junior
Deep hands‑on experience with
Apache Spark ,
PySpark , and
AWS (especially EMR)
Experience building pipelines using
Databricks
is required.
Strong programming skills in
Python
Solid understanding of
cloud‑native
architectures
Familiarity with
GxP compliance
and working in regulated data environments
Proven ability to independently design and develop data pipelines (not a DevOps/AWS admin role)
Experience with distributed computing and high‑volume ETL pipelines
Equal Opportunity Statement BigRio is an equal‑opportunity employer. We prohibit discrimination and harassment of any kind based on race, religion, national origin, sex, sexual orientation, gender identity, age, pregnancy, status as a qualified individual with disability, protected veteran status, or other protected characteristic as outlined by federal, state, or local laws. BigRio makes hiring decisions based solely on qualifications, merit, and business needs at the time. All qualified applicants will receive equal consideration for employment.
BigRio is a leading AI, Gen AI, Data and Analytics professional services company. We are focused on Healthcare, Pharma, Digital Health, Provider, and Payer Industry segments with several innovative solutions.
Harvard Square, One Mifflin Place Suite 400 Cambridge, MA 02138
#J-18808-Ljbffr