Purple Drive

DATA ENGINEER

Purple Drive, Pleasanton, California, United States, 94566

Job Summary:

The Senior Data Engineer will be responsible for designing, building, and maintaining robust data pipelines and architectures on AWS to support scalable data processing, storage, and analytics. The ideal candidate will possess deep expertise in AWS services, PySpark, and data modeling, with proven experience in developing data-driven solutions that support business intelligence and analytics initiatives.

Key Responsibilities:

Design, develop, and optimize

data ingestion and transformation pipelines

using

AWS Glue ,

PySpark , and other AWS-native services. Build and manage

data lake and data warehouse solutions

using

Amazon S3

and

Amazon Redshift . Develop and maintain

data models, schemas, and ETL frameworks

to ensure efficient data organization and accessibility. Collaborate with data analysts, scientists, and business teams to understand data requirements and deliver reliable solutions. Implement

data quality checks, validation frameworks, and automation

to ensure data accuracy and reliability. Integrate version control and CI/CD practices using

Git

and related tools. Monitor and optimize performance, scalability, and cost efficiency of data pipelines in AWS. Ensure compliance with

data governance, security, and best practices

in data handling and storage. Required Skills and Qualifications:

Bachelor's or Master's degree in Computer Science, Information Technology, or related field. 6+ years of experience as a Data Engineer or in a similar role. Strong hands-on experience with

AWS data services

(Glue, Redshift, S3, Lambda, IAM, CloudWatch). Proficiency in

PySpark

for large-scale data transformation and processing. Deep understanding of

data modeling ,

ETL development , and

data warehousing concepts . Proficient in

SQL

and performance optimization for analytical workloads. Experience with

Git

for version control and collaborative development. Strong problem-solving, debugging, and analytical skills. Good to Have:

Experience with

Terraform or CloudFormation

for infrastructure automation. Familiarity with

Apache Airflow

or similar orchestration tools. Knowledge of

data governance, cataloging, and metadata management

practices. Soft Skills:

Excellent communication and documentation skills. Ability to work collaboratively in agile, cross-functional teams. Strong ownership mindset with attention to detail and quality.