Logo
TurbineOne

Senior Data Scientist - Machine Learning Data Operations

TurbineOne, San Francisco, California, United States, 94199

Save Job

Senior Data Scientist - Machine Learning Data Operations ABOUT THE JOB

Company : TurbineOne — TurbineOne is the frontline perception company. We deliver decision advantage, better situational awareness, and stronger force protection. The company is a small, fast-moving startup backed by defense tech venture capitalists. TurbineOne is deployed by every branch of the Department of Defense to solve critical missions.

Reporting to the Machine Learning team lead

Geographically flexible for home-office

Responsibilities

Ingesting, organizing, and maintaining large-scale training datasets from open-source resources and contract-specific artifacts

Creating and managing data cataloging systems to ensure datasets are findable, accessible, and ready for ML training pipelines

Designing and implementing data labeling workflows, including managing external labeling vendors and quality assurance processes

Building and maintaining YOLO-style manifests and annotation formats for custom computer vision datasets

Performing data cleaning, validation, and augmentation to ensure high-quality training data

Conducting exploratory data analysis and generating insights about dataset characteristics, biases, and coverage gaps

Supporting the ML research team with statistical analysis, experiment design, and model evaluation

Developing data pipelines and automation tools for continuous data ingestion and processing

Collaborating with ML engineers to optimize data loading and preprocessing for training efficiency

On a Typical Day You Would

Process incoming datasets from various sources, performing quality checks and organizing them into our data management system

Create or review annotation schemas and coordinate with labeling teams to ensure consistent, high-quality labels

Write Python scripts to clean, transform, and validate datasets for specific ML training requirements

Analyze dataset statistics and create visualizations to identify potential issues or opportunities for improvement

Collaborate with the ML research lead to design experiments and evaluate model performance across different data splits

Document dataset characteristics, versioning, and lineage to maintain reproducibility and compliance

Desired Experience

High standard of ethics, grit, integrity and moral character

5+ years of experience in data science, analytics, or related field with focus on ML data preparation

Strong foundation in probability, statistics, and experimental design

Bachelor's degree in Statistics, Mathematics, Computer Science, or related quantitative field (Master's preferred)

Proficiency with Python data stack: Pandas, NumPy, Jupyter Notebooks, and data visualization libraries

Experience with ML frameworks (PyTorch, Scikit-learn) and familiarity with training workflows

Hands-on experience with computer vision datasets and annotation formats (COCO, YOLO, Pascal VOC)

Experience managing data labeling projects and working with annotation tools (Label Studio, CVAT, or similar)

Familiarity with open-source ML models and experience applying them to real-world problems

Strong SQL skills and experience with data warehousing concepts

Experience with version control (Git) and collaborative development practices

Excellent communication skills for coordinating with technical and non-technical stakeholders

Meticulous attention to detail and strong organizational skills for managing complex datasets

Willingness to embrace the Startup Culture of moving fast, being insatiably curious, celebrating often, embracing uncertainty, and having a personal desire to improve other peoples' lives

Nice to Have

Experience with defense or security-related datasets

Knowledge of edge computing constraints and data optimization techniques

Experience with distributed data processing frameworks (Spark, Dask)

Familiarity with MLOps practices and tools

Background in specific domains relevant to perception systems (satellite imagery, sensor fusion, etc.)

Startup Culture Expectations

We\'re a small, fully remote team and everything is our responsibility

Our team thrives on autonomy, trust and solid communication

Everyone on the Team needs to be very comfortable with constant change, moving fast, sharing failures, embracing grit, and building things themselves

Eligibility

Must be eligible to obtain a clearance with the U.S. government

Seniority level

Mid-Senior level

Employment type

Full-time

Job function

Engineering and Information Technology

Industries

Defense & Space

#J-18808-Ljbffr