Qode
Job Summary
We are looking for an experienced
AWS Data Engineer
with strong expertise in
Python and PySpark
to design, build, and maintain large-scale data pipelines and cloud-based data platforms. The ideal candidate will have hands-on experience with
AWS services , distributed data processing, and implementing scalable solutions for analytics and machine learning use cases. Key Responsibilities
·
Design, develop, and optimize
data pipelines
using
Python, PySpark, and SQL . ·
Build and manage
ETL/ELT workflows
for structured and unstructured data. ·
Leverage
AWS services
(S3, Glue, EMR, Redshift, Lambda, Athena, Kinesis, Step Functions, RDS) for data engineering solutions. ·
Implement
data lake/data warehouse
architectures and ensure data quality, consistency, and security. ·
Work with large-scale distributed systems for real-time and batch data processing. ·
Collaborate with data scientists, analysts, and business stakeholders to deliver high-quality, reliable data solutions. ·
Develop and enforce
data governance, monitoring, and best practices
for performance optimization. ·
Deploy and manage
CI/CD pipelines
for data workflows using AWS tools (CodePipeline, CodeBuild) or GitHub Actions. Required Skills & Qualifications
·
Strong programming skills in
Python
and hands-on experience with
PySpark . ·
Proficiency in
SQL
for complex queries, transformations, and performance tuning. ·
Solid experience with
AWS cloud ecosystem
(S3, Glue, EMR, Redshift, Athena, Lambda, etc.). ·
Experience working with
data lakes, data warehouses, and distributed systems . ·
Knowledge of
ETL frameworks , workflow orchestration (Airflow, Step Functions, or similar), and automation. ·
Familiarity with
Docker, Kubernetes, or containerized deployments . ·
Strong understanding of
data modeling, partitioning, and optimization
techniques. · Excellent problem-solving, debugging, and communication skills.
We are looking for an experienced
AWS Data Engineer
with strong expertise in
Python and PySpark
to design, build, and maintain large-scale data pipelines and cloud-based data platforms. The ideal candidate will have hands-on experience with
AWS services , distributed data processing, and implementing scalable solutions for analytics and machine learning use cases. Key Responsibilities
·
Design, develop, and optimize
data pipelines
using
Python, PySpark, and SQL . ·
Build and manage
ETL/ELT workflows
for structured and unstructured data. ·
Leverage
AWS services
(S3, Glue, EMR, Redshift, Lambda, Athena, Kinesis, Step Functions, RDS) for data engineering solutions. ·
Implement
data lake/data warehouse
architectures and ensure data quality, consistency, and security. ·
Work with large-scale distributed systems for real-time and batch data processing. ·
Collaborate with data scientists, analysts, and business stakeholders to deliver high-quality, reliable data solutions. ·
Develop and enforce
data governance, monitoring, and best practices
for performance optimization. ·
Deploy and manage
CI/CD pipelines
for data workflows using AWS tools (CodePipeline, CodeBuild) or GitHub Actions. Required Skills & Qualifications
·
Strong programming skills in
Python
and hands-on experience with
PySpark . ·
Proficiency in
SQL
for complex queries, transformations, and performance tuning. ·
Solid experience with
AWS cloud ecosystem
(S3, Glue, EMR, Redshift, Athena, Lambda, etc.). ·
Experience working with
data lakes, data warehouses, and distributed systems . ·
Knowledge of
ETL frameworks , workflow orchestration (Airflow, Step Functions, or similar), and automation. ·
Familiarity with
Docker, Kubernetes, or containerized deployments . ·
Strong understanding of
data modeling, partitioning, and optimization
techniques. · Excellent problem-solving, debugging, and communication skills.