Logo
Capgemini

Big Data Developer (Alpharetta)

Capgemini, Alpharetta, Georgia, United States, 30239

Save Job

Were looking for a seasoned

Senior Data Engineer with strong Hadoop

to design, build, and scale data pipelines and platforms powering analytics, AI/ML, and business operations. Youll own end-to-end data engineeringfrom ingestion and transformation to performance optimizationacross large-scale distributed systems and modern cloud data platforms.

Key Responsibilities Design & Build Data Pipelines:

Architect, develop, and maintain robust ETL/ELT pipelines for batch and streaming data using Hadoop ecosystem, Spark, and Airflow. Big Data Architecture:

Define and implement scalable big data architectures, ensuring reliability, fault tolerance, and cost efficiency. Data Modeling:

Develop and optimize data models for

Data Warehouse

and

Operational Data Store (ODS) ; ensure conformed dimensions and star/snowflake schemas where appropriate. SQL Expertise:

Write, optimize, and review complex SQL/HiveQL queries for large datasets; enforce query standards and patterns. Performance Tuning:

Optimize Spark jobs, SQL queries, storage formats (e.g., Parquet/ORC), partitioning, and indexing to improve latency and throughput. Data Quality & Governance:

Implement data validation, lineage, cataloging, and security controls across environments. Workflow Orchestration:

Build and manage DAGs in

Airflow , ensuring observability, retries, alerting, and SLAs. Cross-functional Collaboration:

Partner with Data Science, Analytics, and Product teams to deliver reliable datasets and features. Best Practices:

Champion coding standards, CI/CD, infrastructure-as-code (IaC), and documentation across the data platform.

Required Qualifications 7+ years

of hands-on data engineering experience building production-grade pipelines. Strong experience with

Hadoop

(HDFS, YARN),

Hive SQL/HiveQL ,

Spark

(Scala/Java/PySpark), and

Airflow . Expert-level SQL

skills with the ability to write and tune complex queries on large datasets. Solid understanding of

Big Data architecture

patterns (e.g., lakehouse, data lake + warehouse, CDC). Deep knowledge of

ETL/ELT

and

DW/ODS

concepts (slowly changing dimensions, partitioning, columnar storage, incremental loads). Proven track record in

performance tuning

for large-scale systems (Spark jobs, shuffle optimizations, broadcast joins, skew handling). Strong programming background in

Java

and/or

Scala

(Python is a plus).

Preferred Skills Experience with

AI-driven data processing

(feature engineering pipelines, ML-ready datasets, model data dependencies). Hands-on with

cloud data platforms

( AWS ,

GCP , or

Azure )services like EMR/Dataproc/HDInsight, S3/GCS/ADLS, Glue/Dataflow, BigQuery/Snowflake/Redshift/Synapse. Exposure to

NoSQL

databases (Cassandra, HBase, DynamoDB, MongoDB). Advanced data governance & security

(row/column-level security, tokenization, encryption at rest/in transit, IAM/RBAC, data lineage/catalog). Familiarity with

Kafka

(topics, partitions, consumer groups, schema registry, stream processing). Experience with

CI/CD

for data (Git, Jenkins/GitHub Actions, Terraform), containerization (Docker, Kubernetes). Knowledge of

metadata management

and

data observability

(Great Expectations, Monte Carlo, OpenLineage).

Life at Capgemini: Capgemini supports all aspects of your well-being throughout the changing stages of your life and career. For eligible employees, we offer: Flexible work Healthcare including dental, vision, mental health, and well-being programs Financial well-being programs such as 401(k) and Employee Share Ownership Plan Paid time off and paid holidays Paid parental leave Family building benefits like adoption assistance, surrogacy, and cryopreservation Social well-being benefits like subsidized back-up child/elder care and tutoring Mentoring, coaching and learning programs Employee Resource Groups Disaster Relief

Disclaimer: Capgemini is an Equal Opportunity Employer encouraging diversity in the workplace. All qualified applicants will receive consideration for employment without regard to race, national origin, gender identity/expression, age, religion, disability, sexual orientation, genetics, veteran status, marital status or any other characteristic protected by law. This is a general description of the Duties, Responsibilities and Qualifications required for this position. Physical, mental, sensory or environmental demands may be referenced in an attempt to communicate the manner in which this position traditionally is performed. Whenever necessary to provide individuals with disabilities an equal employment opportunity, Capgemini will consider reasonable accommodations that might involve varying job requirements and/or changing the way this job is performed, provided that such accommodations do not pose an undue hardship. Capgemini is committed to providing reasonable accommodations during our recruitment process. If you need assistance or accommodation, please reach out to your recruiting contact. Click the following link for more information on your rights as an Applicant

http://www.capgemini.com/resources/equal-employment-opportunity-is-the-law