Purple Drive
Data Engineer
Location : Pittsburgh, PA
Job Description
We are seeking a skilled
Data Engineer
with strong expertise in
Databricks, PySpark, Python, and SQL
to design, develop, and maintain scalable data pipelines for advanced analytics and business intelligence. The ideal candidate will have hands-on experience working with cloud-based platforms, modern data lakehouse architectures, and distributed data processing systems. You will collaborate with cross-functional teams to deliver high-performance, reliable, and secure data solutions.
Key Responsibilities
Design, build, and maintain
data pipelines
using
Databricks and PySpark . Develop
ETL/ELT workflows
for structured, semi-structured, and unstructured data. Write and optimize
Python and SQL scripts
for data transformation, validation, and reporting. Implement
data lakehouse solutions
for scalable storage and analytics. Ensure
data quality, governance, and lineage tracking
across pipelines. Collaborate with analysts, data scientists, and business teams to deliver business-ready datasets. Deploy and maintain pipelines in
cloud environments
(AWS / Azure / GCP). Monitor and troubleshoot data workflows, ensuring high availability and performance. Mandatory Skills
Databricks
- strong hands-on experience with data engineering and ML workflows. PySpark
- expertise in distributed data processing. Python
- proficient in scripting, automation, and ETL development. SQL
- strong ability to write complex queries, optimize performance, and work with large datasets. Cloud Platforms
- AWS / Azure / GCP (experience with storage, compute, and orchestration services). Data Warehousing
- knowledge of Redshift, Snowflake, or Synapse (preferred). Data Governance & Quality
- experience with schema validation, testing frameworks. Collaboration
- ability to work in Agile teams and communicate effectively with stakeholders.
Location : Pittsburgh, PA
Job Description
We are seeking a skilled
Data Engineer
with strong expertise in
Databricks, PySpark, Python, and SQL
to design, develop, and maintain scalable data pipelines for advanced analytics and business intelligence. The ideal candidate will have hands-on experience working with cloud-based platforms, modern data lakehouse architectures, and distributed data processing systems. You will collaborate with cross-functional teams to deliver high-performance, reliable, and secure data solutions.
Key Responsibilities
Design, build, and maintain
data pipelines
using
Databricks and PySpark . Develop
ETL/ELT workflows
for structured, semi-structured, and unstructured data. Write and optimize
Python and SQL scripts
for data transformation, validation, and reporting. Implement
data lakehouse solutions
for scalable storage and analytics. Ensure
data quality, governance, and lineage tracking
across pipelines. Collaborate with analysts, data scientists, and business teams to deliver business-ready datasets. Deploy and maintain pipelines in
cloud environments
(AWS / Azure / GCP). Monitor and troubleshoot data workflows, ensuring high availability and performance. Mandatory Skills
Databricks
- strong hands-on experience with data engineering and ML workflows. PySpark
- expertise in distributed data processing. Python
- proficient in scripting, automation, and ETL development. SQL
- strong ability to write complex queries, optimize performance, and work with large datasets. Cloud Platforms
- AWS / Azure / GCP (experience with storage, compute, and orchestration services). Data Warehousing
- knowledge of Redshift, Snowflake, or Synapse (preferred). Data Governance & Quality
- experience with schema validation, testing frameworks. Collaboration
- ability to work in Agile teams and communicate effectively with stakeholders.