Purple Drive
Role Overview
We are seeking a
Data Engineer
with strong expertise in
Databricks, Python, and PySpark , coupled with experience in CI/CD pipelines. The ideal candidate will have a solid background in
data management, data warehousing, and data integration , with proven experience in developing scalable, high-performance data solutions on cloud platforms.
Key Responsibilities
Design, build, and optimize
data pipelines
using Databricks, PySpark, and Python. Develop and maintain
data quality rules, transformations, and mappings
to ensure data accuracy and consistency. Write and optimize
complex SQL queries
for large-scale data processing. Work with
cloud platforms
(AWS/Azure/GCP) to deliver secure, scalable solutions. Support
data integration, data warehousing, and cleansing
initiatives. Collaborate with cross-functional teams to deliver high-quality solutions following
Agile Scrum
practices. Troubleshoot production issues in
Oracle and MS SQL Server
environments and drive timely resolution. Implement
CI/CD pipelines
for continuous integration, testing, and deployment. Follow best practices for data management and software development lifecycle (SDLC). Required Skills & Experience
6-8 years
of overall IT/data engineering experience. Must have:
Databricks, Python, PySpark, and CI/CD experience. 3-5 years
of experience in data management, warehousing, integration, and cleansing. Strong
SQL programming
skills (Oracle and MS SQL preferred). 2+ years
of experience with Python (Perl is a plus). 3+ years
working with cloud technologies (AWS, Azure, or GCP). Strong analytical, problem-solving, and troubleshooting skills. Experience in
Agile Scrum
delivery and
SDLC processes . Nice-to-Have Skills
Experience with
data governance and metadata management . Exposure to
ETL tools
and orchestration frameworks (Airflow, ADF, etc.). Knowledge of
DevOps practices and containerization
(Docker, Kubernetes).
We are seeking a
Data Engineer
with strong expertise in
Databricks, Python, and PySpark , coupled with experience in CI/CD pipelines. The ideal candidate will have a solid background in
data management, data warehousing, and data integration , with proven experience in developing scalable, high-performance data solutions on cloud platforms.
Key Responsibilities
Design, build, and optimize
data pipelines
using Databricks, PySpark, and Python. Develop and maintain
data quality rules, transformations, and mappings
to ensure data accuracy and consistency. Write and optimize
complex SQL queries
for large-scale data processing. Work with
cloud platforms
(AWS/Azure/GCP) to deliver secure, scalable solutions. Support
data integration, data warehousing, and cleansing
initiatives. Collaborate with cross-functional teams to deliver high-quality solutions following
Agile Scrum
practices. Troubleshoot production issues in
Oracle and MS SQL Server
environments and drive timely resolution. Implement
CI/CD pipelines
for continuous integration, testing, and deployment. Follow best practices for data management and software development lifecycle (SDLC). Required Skills & Experience
6-8 years
of overall IT/data engineering experience. Must have:
Databricks, Python, PySpark, and CI/CD experience. 3-5 years
of experience in data management, warehousing, integration, and cleansing. Strong
SQL programming
skills (Oracle and MS SQL preferred). 2+ years
of experience with Python (Perl is a plus). 3+ years
working with cloud technologies (AWS, Azure, or GCP). Strong analytical, problem-solving, and troubleshooting skills. Experience in
Agile Scrum
delivery and
SDLC processes . Nice-to-Have Skills
Experience with
data governance and metadata management . Exposure to
ETL tools
and orchestration frameworks (Airflow, ADF, etc.). Knowledge of
DevOps practices and containerization
(Docker, Kubernetes).