Purple Drive Technologies LLC

Big Data Engineer

Purple Drive Technologies LLC, Tampa, Florida, us, 33646

Job Title: Big Data Engineer Location:

Atlanta, GA / Tampa, FL / Dallas, TX

Job Summary

We are seeking an experienced

Big Data Engineer

to design, build, and optimize large-scale data pipelines and distributed systems across cloud and on-prem platforms. The ideal candidate will have strong expertise in

Spark, Hadoop ecosystems, cloud data services, ETL/ELT design, streaming platforms , and best practices for scalable data processing.

Responsibilities

Design, develop, and maintain

big data pipelines

using Spark (PySpark/Scala), Hadoop, Kafka, and distributed computing frameworks.

Build and optimize

ETL/ELT pipelines

for structured and unstructured data across cloud and on-prem data platforms.

Work with

cloud technologies

(AWS, Azure, or Google Cloud Platform) including Data Lake, Databricks, EMR, Glue, Dataflow, Synapse, or Snowflake.

Develop

Delta Lake / Lakehouse architecture

for high-performance ingestion and processing.

Build

real-time streaming

solutions using Kafka, Spark Streaming, Kinesis, or Event Hub.

Collaborate with data architects, analysts, and application teams to gather data requirements and deliver scalable solutions.

Implement and maintain

CI/CD pipelines , automated jobs, and orchestration using Airflow/Azure Data Factory/Glue Workflows.

Optimize data pipelines for performance, cost efficiency, and reliability.

Ensure data quality, validation, governance, and lineage best practices across the ecosystem.

Troubleshoot and resolve production issues in a high-availability environment.

Required Skills & Qualifications

5-8 years

of experience as a Big Data Engineer or Data Engineer.

Strong hands-on skills with

Spark (PySpark or Scala), Hadoop, Hive, HDFS, MapReduce .

Experience working with

Databricks, EMR, Glue, Synapse, DataProc , or equivalent big-data compute engines.

Proficiency in

Python or Scala

for data engineering.

Experience with

Kafka

or other event-streaming technologies.

Strong understanding of

cloud data architectures

(AWS S3/Glue/EMR | Azure ADLS/ADF/Databricks | Google Cloud Platform BigQuery/DataProc).

Solid SQL skills and experience with relational NoSQL databases.

Experience with

version control (Git)

and CI/CD tools (Jenkins, Azure DevOps, GitHub Actions).

Hands-on experience with

Airflow , ADF, or other orchestration and scheduling tools.

Familiarity with data modeling, data governance, and best practices for data quality.

#J-18808-Ljbffr