Purple Drive
Job Summary:
The Senior Data Engineer will be responsible for designing, building, and maintaining robust data pipelines and architectures on AWS to support scalable data processing, storage, and analytics. The ideal candidate will possess deep expertise in AWS services, PySpark, and data modeling, with proven experience in developing data-driven solutions that support business intelligence and analytics initiatives.
Key Responsibilities:
Design, develop, and optimize
data ingestion and transformation pipelines
using
AWS Glue ,
PySpark , and other AWS-native services. Build and manage
data lake and data warehouse solutions
using
Amazon S3
and
Amazon Redshift . Develop and maintain
data models, schemas, and ETL frameworks
to ensure efficient data organization and accessibility. Collaborate with data analysts, scientists, and business teams to understand data requirements and deliver reliable solutions. Implement
data quality checks, validation frameworks, and automation
to ensure data accuracy and reliability. Integrate version control and CI/CD practices using
Git
and related tools. Monitor and optimize performance, scalability, and cost efficiency of data pipelines in AWS. Ensure compliance with
data governance, security, and best practices
in data handling and storage. Required Skills and Qualifications:
Bachelor's or Master's degree in Computer Science, Information Technology, or related field. 6+ years of experience as a Data Engineer or in a similar role. Strong hands-on experience with
AWS data services
(Glue, Redshift, S3, Lambda, IAM, CloudWatch). Proficiency in
PySpark
for large-scale data transformation and processing. Deep understanding of
data modeling ,
ETL development , and
data warehousing concepts . Proficient in
SQL
and performance optimization for analytical workloads. Experience with
Git
for version control and collaborative development. Strong problem-solving, debugging, and analytical skills. Good to Have:
Experience with
Terraform or CloudFormation
for infrastructure automation. Familiarity with
Apache Airflow
or similar orchestration tools. Knowledge of
data governance, cataloging, and metadata management
practices. Soft Skills:
Excellent communication and documentation skills. Ability to work collaboratively in agile, cross-functional teams. Strong ownership mindset with attention to detail and quality.
The Senior Data Engineer will be responsible for designing, building, and maintaining robust data pipelines and architectures on AWS to support scalable data processing, storage, and analytics. The ideal candidate will possess deep expertise in AWS services, PySpark, and data modeling, with proven experience in developing data-driven solutions that support business intelligence and analytics initiatives.
Key Responsibilities:
Design, develop, and optimize
data ingestion and transformation pipelines
using
AWS Glue ,
PySpark , and other AWS-native services. Build and manage
data lake and data warehouse solutions
using
Amazon S3
and
Amazon Redshift . Develop and maintain
data models, schemas, and ETL frameworks
to ensure efficient data organization and accessibility. Collaborate with data analysts, scientists, and business teams to understand data requirements and deliver reliable solutions. Implement
data quality checks, validation frameworks, and automation
to ensure data accuracy and reliability. Integrate version control and CI/CD practices using
Git
and related tools. Monitor and optimize performance, scalability, and cost efficiency of data pipelines in AWS. Ensure compliance with
data governance, security, and best practices
in data handling and storage. Required Skills and Qualifications:
Bachelor's or Master's degree in Computer Science, Information Technology, or related field. 6+ years of experience as a Data Engineer or in a similar role. Strong hands-on experience with
AWS data services
(Glue, Redshift, S3, Lambda, IAM, CloudWatch). Proficiency in
PySpark
for large-scale data transformation and processing. Deep understanding of
data modeling ,
ETL development , and
data warehousing concepts . Proficient in
SQL
and performance optimization for analytical workloads. Experience with
Git
for version control and collaborative development. Strong problem-solving, debugging, and analytical skills. Good to Have:
Experience with
Terraform or CloudFormation
for infrastructure automation. Familiarity with
Apache Airflow
or similar orchestration tools. Knowledge of
data governance, cataloging, and metadata management
practices. Soft Skills:
Excellent communication and documentation skills. Ability to work collaboratively in agile, cross-functional teams. Strong ownership mindset with attention to detail and quality.