Purple Drive Technologies LLC
Job Title: Big Data Engineer
Location:
Atlanta, GA / Tampa, FL / Dallas, TX
Job Summary
We are seeking an experienced
Big Data Engineer
to design, build, and optimize large-scale data pipelines and distributed systems across cloud and on-prem platforms. The ideal candidate will have strong expertise in
Spark, Hadoop ecosystems, cloud data services, ETL/ELT design, streaming platforms , and best practices for scalable data processing.
Responsibilities
Design, develop, and maintain
big data pipelines
using Spark (PySpark/Scala), Hadoop, Kafka, and distributed computing frameworks.
Build and optimize
ETL/ELT pipelines
for structured and unstructured data across cloud and on-prem data platforms.
Work with
cloud technologies
(AWS, Azure, or Google Cloud Platform) including Data Lake, Databricks, EMR, Glue, Dataflow, Synapse, or Snowflake.
Develop
Delta Lake / Lakehouse architecture
for high-performance ingestion and processing.
Build
real-time streaming
solutions using Kafka, Spark Streaming, Kinesis, or Event Hub.
Collaborate with data architects, analysts, and application teams to gather data requirements and deliver scalable solutions.
Implement and maintain
CI/CD pipelines , automated jobs, and orchestration using Airflow/Azure Data Factory/Glue Workflows.
Optimize data pipelines for performance, cost efficiency, and reliability.
Ensure data quality, validation, governance, and lineage best practices across the ecosystem.
Troubleshoot and resolve production issues in a high-availability environment.
Required Skills & Qualifications
5-8 years
of experience as a Big Data Engineer or Data Engineer.
Strong hands-on skills with
Spark (PySpark or Scala), Hadoop, Hive, HDFS, MapReduce .
Experience working with
Databricks, EMR, Glue, Synapse, DataProc , or equivalent big-data compute engines.
Proficiency in
Python or Scala
for data engineering.
Experience with
Kafka
or other event-streaming technologies.
Strong understanding of
cloud data architectures
(AWS S3/Glue/EMR | Azure ADLS/ADF/Databricks | Google Cloud Platform BigQuery/DataProc).
Solid SQL skills and experience with relational NoSQL databases.
Experience with
version control (Git)
and CI/CD tools (Jenkins, Azure DevOps, GitHub Actions).
Hands-on experience with
Airflow , ADF, or other orchestration and scheduling tools.
Familiarity with data modeling, data governance, and best practices for data quality.
#J-18808-Ljbffr
Atlanta, GA / Tampa, FL / Dallas, TX
Job Summary
We are seeking an experienced
Big Data Engineer
to design, build, and optimize large-scale data pipelines and distributed systems across cloud and on-prem platforms. The ideal candidate will have strong expertise in
Spark, Hadoop ecosystems, cloud data services, ETL/ELT design, streaming platforms , and best practices for scalable data processing.
Responsibilities
Design, develop, and maintain
big data pipelines
using Spark (PySpark/Scala), Hadoop, Kafka, and distributed computing frameworks.
Build and optimize
ETL/ELT pipelines
for structured and unstructured data across cloud and on-prem data platforms.
Work with
cloud technologies
(AWS, Azure, or Google Cloud Platform) including Data Lake, Databricks, EMR, Glue, Dataflow, Synapse, or Snowflake.
Develop
Delta Lake / Lakehouse architecture
for high-performance ingestion and processing.
Build
real-time streaming
solutions using Kafka, Spark Streaming, Kinesis, or Event Hub.
Collaborate with data architects, analysts, and application teams to gather data requirements and deliver scalable solutions.
Implement and maintain
CI/CD pipelines , automated jobs, and orchestration using Airflow/Azure Data Factory/Glue Workflows.
Optimize data pipelines for performance, cost efficiency, and reliability.
Ensure data quality, validation, governance, and lineage best practices across the ecosystem.
Troubleshoot and resolve production issues in a high-availability environment.
Required Skills & Qualifications
5-8 years
of experience as a Big Data Engineer or Data Engineer.
Strong hands-on skills with
Spark (PySpark or Scala), Hadoop, Hive, HDFS, MapReduce .
Experience working with
Databricks, EMR, Glue, Synapse, DataProc , or equivalent big-data compute engines.
Proficiency in
Python or Scala
for data engineering.
Experience with
Kafka
or other event-streaming technologies.
Strong understanding of
cloud data architectures
(AWS S3/Glue/EMR | Azure ADLS/ADF/Databricks | Google Cloud Platform BigQuery/DataProc).
Solid SQL skills and experience with relational NoSQL databases.
Experience with
version control (Git)
and CI/CD tools (Jenkins, Azure DevOps, GitHub Actions).
Hands-on experience with
Airflow , ADF, or other orchestration and scheduling tools.
Familiarity with data modeling, data governance, and best practices for data quality.
#J-18808-Ljbffr