Purple Drive
Key Responsibilities:
Cloud Data Engineering & Architecture
Design, build, and manage data pipelines and analytics workflows using AWS EMR, EKS, and Databricks (DBX).
Implement best practices for distributed data processing, storage, and compute scalability.
Ensure reliability, availability, and performance of big data solutions in production.
Infrastructure Automation & CI/CD
Develop and maintain Infrastructure-as-Code (IaC) using Terraform to provision and manage AWS resources.
Automate cluster deployments, scaling policies, and monitoring configurations.
Integrate pipelines with CI/CD tools for seamless deployment and release management.
Programming & Data Processing
Develop data transformation, processing, and analytics applications using Scala, Python, and Java.
Optimize Spark jobs for performance, scalability, and cost efficiency.
Write reusable, testable, and efficient code following software engineering best practices.
Collaboration & Stakeholder Engagement
Collaborate with Data Scientists, Analysts, and Product teams to enable self-service analytics and ML model operationalization.
Work closely with DevOps and Cloud Engineering teams to ensure compliance, observability, and security.
Document designs, workflows, and standards for knowledge sharing and future scaling.
Required Skills & Qualifications:
Cloud & Big Data: Strong hands-on experience with AWS EMR, AWS EKS, Databricks (DBX).
IaC & Automation: Proficiency in Terraform for managing cloud infrastructure.
Programming: Strong coding skills in Scala, Python, and Java.
Distributed Computing: Deep knowledge of Apache Spark, Hadoop ecosystem, and data processing frameworks.
Containers & Orchestration: Experience with Kubernetes (EKS) for containerized workloads.
Data Pipelines: Expertise in building ETL/ELT pipelines for structured and unstructured data.
Version Control & CI/CD: Experience with Git, Jenkins, GitLab CI/CD, or similar tools.
Problem Solving: Strong analytical, debugging, and optimization skills.
Preferred Qualifications:
Experience with streaming technologies (Kafka, Kinesis, Flink, Spark Streaming).
Familiarity with data governance, lineage, and security best practices on AWS.
Knowledge of machine learning pipeline integration in Databricks.
Experience with monitoring and observability tools (Prometheus, Grafana, CloudWatch).
Education & Experience:
Bachelor's or Master's degree in Computer Science, Data Engineering, or related field.
5+ years of experience in Data Engineering, Big Data, or Cloud Engineering roles.
Proven track record of delivering production-grade data platforms at scale.
Cloud Data Engineering & Architecture
Design, build, and manage data pipelines and analytics workflows using AWS EMR, EKS, and Databricks (DBX).
Implement best practices for distributed data processing, storage, and compute scalability.
Ensure reliability, availability, and performance of big data solutions in production.
Infrastructure Automation & CI/CD
Develop and maintain Infrastructure-as-Code (IaC) using Terraform to provision and manage AWS resources.
Automate cluster deployments, scaling policies, and monitoring configurations.
Integrate pipelines with CI/CD tools for seamless deployment and release management.
Programming & Data Processing
Develop data transformation, processing, and analytics applications using Scala, Python, and Java.
Optimize Spark jobs for performance, scalability, and cost efficiency.
Write reusable, testable, and efficient code following software engineering best practices.
Collaboration & Stakeholder Engagement
Collaborate with Data Scientists, Analysts, and Product teams to enable self-service analytics and ML model operationalization.
Work closely with DevOps and Cloud Engineering teams to ensure compliance, observability, and security.
Document designs, workflows, and standards for knowledge sharing and future scaling.
Required Skills & Qualifications:
Cloud & Big Data: Strong hands-on experience with AWS EMR, AWS EKS, Databricks (DBX).
IaC & Automation: Proficiency in Terraform for managing cloud infrastructure.
Programming: Strong coding skills in Scala, Python, and Java.
Distributed Computing: Deep knowledge of Apache Spark, Hadoop ecosystem, and data processing frameworks.
Containers & Orchestration: Experience with Kubernetes (EKS) for containerized workloads.
Data Pipelines: Expertise in building ETL/ELT pipelines for structured and unstructured data.
Version Control & CI/CD: Experience with Git, Jenkins, GitLab CI/CD, or similar tools.
Problem Solving: Strong analytical, debugging, and optimization skills.
Preferred Qualifications:
Experience with streaming technologies (Kafka, Kinesis, Flink, Spark Streaming).
Familiarity with data governance, lineage, and security best practices on AWS.
Knowledge of machine learning pipeline integration in Databricks.
Experience with monitoring and observability tools (Prometheus, Grafana, CloudWatch).
Education & Experience:
Bachelor's or Master's degree in Computer Science, Data Engineering, or related field.
5+ years of experience in Data Engineering, Big Data, or Cloud Engineering roles.
Proven track record of delivering production-grade data platforms at scale.