Logo
Kyyba

Data Engineering Engineer 2

Kyyba, Dearborn, Michigan, United States, 48120

Save Job

Employees in this job function are responsible for designing, building, and maintaining data solutions including data infrastructure, pipelines, etc. for collecting, storing, processing and analyzing large volumes of data efficiently and accurately Key Responsibilities: 1) Collaborate with business and technology stakeholders to understand current and future data requirements 2) Design, build and maintain reliable, efficient and scalable data infrastructure for data collection, storage, transformation, and analysis 3) Plan, design, build and maintain scalable data solutions including data pipelines, data models, and applications for efficient and reliable data workflow 4) Design, implement and maintain existing and future data platforms like data warehouses, data lakes, data lakehouse etc. for structured and unstructured data 5) Design and develop analytical tools, algorithms, and programs to support data engineering activities like writing scripts and automating tasks 6) Ensure optimum performance and identify improvement opportunities

Skills Required: Google Cloud Platform, ETL, Apache Spark, Data Architecture, Python, SQL, KAFKA

Skills Preferred: Java, Powershell, Data Acquisition, Data Analysis, Data Collection, Data Conversion, Data Integrity, Data/Analytics dashboards

Experience Required: Engineer 2 Exp: 4+ years Data Engineering work experience

Experience Preferred: • Data Pipeline Architecture & Development: Design, build, and maintain highly scalable, fault-tolerant, and performant data pipelines to ingest and process data from 10+ siloed sources, including both structured and unstructured formats. • ML-Driven ETL Implementation: Operationalize ETL pipelines for intelligent data ingestion, automated cataloging, and sophisticated normalization of diverse datasets. • Unified Data Model Creation: Architect and implement a unified data model capable of connecting all relevant data elements across various sources, optimized for efficient querying and insight generation by AI agents and chatbot interfaces. • Big Data Processing: Utilize advanced distributed processing frameworks (Apache Beam, Apache Spark, Google Cloud Dataflow) to handle large-scale data transformations and data flow. • Cloud-Native Data Infrastructure: Leverage GCP services to build and manage robust data storage, processing, and orchestration layers. • Data Quality, Governance & Security: Implement rigorous data quality gates, validation rules, bad record handling, and comprehensive logging. Ensure strict adherence to data security policies, IAM role management, and GCP perimeter security. • Automation & Orchestration: Develop shell scripts, Cloud Build YAMLs, and utilize Cloud Scheduler/PubSub for E2E automation of data pipelines and infrastructure provisioning. • Collaboration with AI/ML Teams: Work closely with AI/ML engineers, data scientists, and product managers to understand data reqts, integrate data solutions with multi-agentic systems, and optimize data delivery for chatbot functionalities. • Testing & CI/CD: Implement robust testing strategies, maintain high code quality through active participation in Git/GitHub, perform code reviews, and manage CI/CD pipelines via Cloud Build. • Perf. Tuning & Optimization: Continuously monitor, optimize, and troubleshoot data pipelines and BQ performance using techniques like table partitioning, clustering, and sharding.

Education Required: Bachelor's Degree

Education Preferred: Certification Program

dditional Information: 4 days in the office.