OSI Engineering

Data Pipeline Engineer (ML)

OSI Engineering, Seattle, Washington, us, 98127

Our client is scaling production ML systems and needs a hands-on engineer to help build, maintain, and run essential

ML data pipelines . Youll own high-throughput data ingestion and transformation workflows (including image- and array-type modalities), enforce rigorous data quality standards, and partner with research and platform teams to keep models fed with reliable, versioned datasets. Design, build, and operate reliable

ML data pipelines

for batch and/or streaming use cases across cloud environments. Develop robust

ETL/ELT

processes (ingest, validate, cleanse, transform, and publish) with clear SLAs and monitoring. Implement

data quality

gates (schema checks, null/outlier handling, drift and bias signals) and

data versioning

for reproducibility. Optimize pipelines for

distributed computing

and large modalities (e.g., images, multi-dimensional arrays). Automate repetitive workflows with CI/CD and infrastructure-as-code; document, test, and harden for production. Collaborate with ML, Data Science, and Platform teams to align datasets, features, and model training needs. Minimum Qualifications: 5+ years

building and operating data pipelines in production. Cloud:

Hands-on with

AWS ,

Azure , or

GCP

services for storage, compute, orchestration, and security. Programming:

Strong proficiency in

Python

and common data/ML libraries ( pandas ,

NumPy , etc.). Distributed compute:

Experience with at least one of

Spark ,

Dask , or

Ray . Modalities:

Experience handling

image-type

and

array-type

data at scale. Automation:

Proven ability to automate repetitive tasks (shell/Python scripting, CI/CD). Data Quality:

Implemented validation, cleansing, and transformation frameworks in production. Data Versioning:

Familiar with tools/practices such as

DVC ,

LakeFS , or similar. Languages:

Fluent in

English

or

Farsi . Strongly PreferredSQL expertise

(writing performant queries; optimizing on large datasets). Data warehousing/lakehouse

concepts and tools (e.g., Snowflake/BigQuery/Redshift; Delta/Lakehouse patterns). Data virtualization/federation

exposure (e.g., Presto/Trino) and semantic/metadata layers. Orchestration

(Airflow, Dagster, Prefect) and observability/monitoring for data pipelines. MLOps

practices (feature stores, experiment tracking, lineage, artifacts). Containers & IaC

(Docker; Terraform/CloudFormation) and CI/CD for data/ML workflows. Testing

for data/ETL (unit/integration tests, great_expectations or similar). Soft Skills Executes independently and

creatively ; comfortable owning outcomes in ambiguous environments. Proactive communicator who collaborates cross-functionally with DS/ML/Platform stakeholders. Location:

Seattle, WA Duration:

1+ year Pay:

$56/hr