Logo
Compunnel

Data Engineer Databricks

Compunnel, San Jose, California, United States, 95199

Save Job

Job Summary

We are seeking a skilled Data Engineer with expertise in Databricks and Apache Spark to design and develop scalable, fault-tolerant real-time data pipelines.

The ideal candidate will work with various data sources and technologies to implement complex transformations, optimize performance, and collaborate on big data architecture and streaming solutions.

Key Responsibilities

Design and Development

Design, develop, and optimize real-time data pipelines using Apache Spark, Scala, and Spark Streaming/Structured Streaming.

Implement complex data transformations, aggregations, and business logic.

Develop reliable Kafka consumers and producers for real-time data ingestion.

Integrate with data sources and sinks including Kafka, relational databases, NoSQL databases (e.g., MongoDB, Cassandra), and cloud storage (e.g., S3, ADLS, GCS).

Performance Tuning & Optimization Analyze and optimize Spark jobs for performance and resource efficiency. Resolve bottlenecks to ensure high throughput and low latency. Architecture & Collaboration

Collaborate with data architects, data scientists, and engineering teams to translate data requirements into technical solutions. Contribute to the architecture and design of big data platforms and streaming solutions. Participate in code reviews and ensure adherence to best practices and security protocols. Testing & Deployment

Develop and execute unit, integration, and end-to-end tests for Spark applications. Automate deployment processes and contribute to CI/CD pipelines. Monitor and troubleshoot production issues for timely resolution. Required Qualifications

Expert proficiency in Scala programming. Strong expertise in Apache Spark, especially Spark Streaming/Structured Streaming. Experience with big data platforms such as Hortonworks, Hadoop, or Cloudera. Solid understanding of distributed computing and big data architectures. Experience with Apache Kafka for messaging and stream processing. Proficiency in SQL and experience with relational and NoSQL databases. Familiarity with cloud platforms (AWS, Azure, GCP) and their big data services. Experience with version control systems like Git. Knowledge of build tools such as SBT.

Education:

Bachelors Degree