Logo
OSI Engineering

Data Engineer

OSI Engineering, Seattle, Washington, United States

Save Job

A globally leading technology company is seeking an experienced Data Engineer to support large-scale data operations for machine learning workflows. You will work closely with external data vendors and internal teams to ingest, validate, curate, and organize high-quality datasets, enabling downstream ML model development. This role requires a strong background in Python and experience working with AWS S3-based pipelines. All qualified candidates are welcome to apply! Job Responsibilities: • Collaborate with external data collection vendors to track and ingest incoming datasets. • Design and execute robust data validation and curation pipelines to ensure data quality and consistency. • Implement logic to bin and categorize data according to project-specific criteria. • Run pseudo-labeling workflows on newly ingested data using pre-trained ML models. • Maintain clear status and versioning of datasets throughout their lifecycle. • Distribute and deliver validated data assets to various internal product and ML teams. • Maintain logs and reports to ensure traceability and accountability across data operations. Candidate Requirements: • 5+ years of industry experience in data engineering, data pipelines, or ML infrastructure. • Strong proficiency in Python, including data processing and scripting. • Experience working with AWS S3 for managing and organizing large-scale datasets. • Familiarity with data quality assurance and curation processes. • Comfortable operating in Unix/Linux environments, with familiarity in using command-line tools. • Strong communication and coordination skills, especially when collaborating with external vendors and distributed teams. • Self-driven, organized, and able to handle multiple data workflows in parallel. Nice to Have: • Experience with ML pipelines, especially pseudo-labeling or active learning. • Familiarity with data versioning tools or frameworks (e.g., DVC, LakeFS). • Prior experience in managing vendor relationships or annotation workflows. • Speak multiple languages Type: Contract Duration: 12 months (with a possibility to extend) Work Location: Sunnyvale, CA (On site) Pay Rate: $ 68.00 - $ 83.00 (DOE)