Purple Drive
****Preferred location is Sunnyvale, CA although will accept candidates from Bentonville, AR.
Job Description:
We are seeking a skilled
Data Engineer
with strong expertise in
Big Data, Cloud platforms, and distributed data systems . The ideal candidate will have hands-on experience in designing, building, and optimizing data pipelines, API integrations, and real-time stream-processing systems.
Key Responsibilities:
Design, develop, and optimize large-scale data pipelines and ETL workflows. Work with
Java, Python, and Scala
to build scalable data solutions. Develop APIs and integrate with systems using
Node.js, GQL, and RESTful services . Implement big data solutions leveraging
Hadoop, Hive, Spark (Scala), Presto/Trino, and Data Lake architectures . Deploy and manage workflows with
Airflow, Luigi, Automic , and similar orchestration tools. Build and maintain
real-time data streaming systems
using
Storm, Spark-Streaming, and Kafka . Utilize
Vertex AI and Cloud services (AWS/GCP/Azure)
for advanced analytics and ML integration. Ensure system reliability, scalability, and performance in distributed environments. Collaborate with cross-functional teams (data scientists, analysts, and engineers) to deliver high-quality data solutions. Apply best practices in CI/CD, Kubernetes-based deployments, and monitoring. Required Skills:
Strong programming skills in
Java, Python, and Scala . Expertise in
big data frameworks : Hadoop, Hive, Spark (Scala). Hands-on with
API development
(REST, GraphQL, Node.js). Experience with
stream-processing tools : Kafka, Storm, Spark-Streaming. Proficiency with
workflow orchestration : Airflow, Luigi, Automic. Knowledge of
Presto/Trino
and distributed SQL query engines. Cloud experience (AWS, GCP, or Azure), with exposure to
Vertex AI . Strong understanding of
Data Lake and data warehousing concepts . Experience with Kubernetes for container orchestration.
Job Description:
We are seeking a skilled
Data Engineer
with strong expertise in
Big Data, Cloud platforms, and distributed data systems . The ideal candidate will have hands-on experience in designing, building, and optimizing data pipelines, API integrations, and real-time stream-processing systems.
Key Responsibilities:
Design, develop, and optimize large-scale data pipelines and ETL workflows. Work with
Java, Python, and Scala
to build scalable data solutions. Develop APIs and integrate with systems using
Node.js, GQL, and RESTful services . Implement big data solutions leveraging
Hadoop, Hive, Spark (Scala), Presto/Trino, and Data Lake architectures . Deploy and manage workflows with
Airflow, Luigi, Automic , and similar orchestration tools. Build and maintain
real-time data streaming systems
using
Storm, Spark-Streaming, and Kafka . Utilize
Vertex AI and Cloud services (AWS/GCP/Azure)
for advanced analytics and ML integration. Ensure system reliability, scalability, and performance in distributed environments. Collaborate with cross-functional teams (data scientists, analysts, and engineers) to deliver high-quality data solutions. Apply best practices in CI/CD, Kubernetes-based deployments, and monitoring. Required Skills:
Strong programming skills in
Java, Python, and Scala . Expertise in
big data frameworks : Hadoop, Hive, Spark (Scala). Hands-on with
API development
(REST, GraphQL, Node.js). Experience with
stream-processing tools : Kafka, Storm, Spark-Streaming. Proficiency with
workflow orchestration : Airflow, Luigi, Automic. Knowledge of
Presto/Trino
and distributed SQL query engines. Cloud experience (AWS, GCP, or Azure), with exposure to
Vertex AI . Strong understanding of
Data Lake and data warehousing concepts . Experience with Kubernetes for container orchestration.