ExecutivePlacements.com
Data Integration Engineer – Irvine, California
We are seeking a Data Integration Engineer to work onsite in Irvine, California. We only consider U.S. citizens or Green Card holders. Relocation is possible, but we prefer local candidates.
Responsibilities
Design and implement batch and streaming pipelines in Apache Spark running on Kubernetes and Kubeflow Pipelines to hydrate feature stores and training datasets.
Build high throughput ETL/ELT jobs with SSIS, SSAS, and T SQL against MS SQL Server, applying Data Vault style modeling patterns for auditability.
Integrate source control, build, and release automation using GitHub Actions and Azure DevOps for every pipeline component.
Instrument pipelines with Prometheus exporters and visualize SLA, latency, and error budget metrics to enable proactive alerting.
Create automated data quality and schema drift checks; surface anomalies to support a rapid incident response process.
Use MLflow Tracking and Model Registry to version artifacts, parameters, and metrics for reproducible experiments and safe rollbacks.
Work with data scientists to automate model retraining and deployment triggers within Kubeflow based on data freshness or concept drift signals.
Develop PowerShell and .NET utilities to orchestrate job dependencies, manage secrets, and publish telemetry to Azure Monitor.
Optimize Spark and SQL workloads through indexing, partitioning, and cluster sizing strategies, benchmarking performance in CI pipelines.
Document lineage, ownership, and retention policies; ensure pipelines conform to PCI/SOX and internal data governance standards.
Qualifications
At least 6 years of experience building data pipelines in Spark or equivalent.
At least 2 years deploying workloads on Kubernetes/Kubeflow.
At least 2 years of experience with MLflow or similar experiment‑tracking tools.
At least 6 years of experience in TSQL, Python/Scala for Spark.
At least 6 years of PowerShell/.NET scripting.
At least 6 years of experience with GitHub, Azure DevOps, Prometheus, Grafana, and SSIS/SSAS.
Kubernetes CKA/CKAD, Azure Data Engineer (DP203), or MLOps‑focused certifications (e.g., Kubeflow or MLflow) would be great to see.
Mentor engineers on best practices in containerized data engineering and MLOps.
Seniority Level
Mid‑Senior level
Employment Type
Full‑time
Job Function
Information Technology
#J-18808-Ljbffr
Responsibilities
Design and implement batch and streaming pipelines in Apache Spark running on Kubernetes and Kubeflow Pipelines to hydrate feature stores and training datasets.
Build high throughput ETL/ELT jobs with SSIS, SSAS, and T SQL against MS SQL Server, applying Data Vault style modeling patterns for auditability.
Integrate source control, build, and release automation using GitHub Actions and Azure DevOps for every pipeline component.
Instrument pipelines with Prometheus exporters and visualize SLA, latency, and error budget metrics to enable proactive alerting.
Create automated data quality and schema drift checks; surface anomalies to support a rapid incident response process.
Use MLflow Tracking and Model Registry to version artifacts, parameters, and metrics for reproducible experiments and safe rollbacks.
Work with data scientists to automate model retraining and deployment triggers within Kubeflow based on data freshness or concept drift signals.
Develop PowerShell and .NET utilities to orchestrate job dependencies, manage secrets, and publish telemetry to Azure Monitor.
Optimize Spark and SQL workloads through indexing, partitioning, and cluster sizing strategies, benchmarking performance in CI pipelines.
Document lineage, ownership, and retention policies; ensure pipelines conform to PCI/SOX and internal data governance standards.
Qualifications
At least 6 years of experience building data pipelines in Spark or equivalent.
At least 2 years deploying workloads on Kubernetes/Kubeflow.
At least 2 years of experience with MLflow or similar experiment‑tracking tools.
At least 6 years of experience in TSQL, Python/Scala for Spark.
At least 6 years of PowerShell/.NET scripting.
At least 6 years of experience with GitHub, Azure DevOps, Prometheus, Grafana, and SSIS/SSAS.
Kubernetes CKA/CKAD, Azure Data Engineer (DP203), or MLOps‑focused certifications (e.g., Kubeflow or MLflow) would be great to see.
Mentor engineers on best practices in containerized data engineering and MLOps.
Seniority Level
Mid‑Senior level
Employment Type
Full‑time
Job Function
Information Technology
#J-18808-Ljbffr