HiringAgents.ai

Senior/Staff Software Engineer - ML Infrastructure

HiringAgents.ai, San Francisco, California, United States, 94199

Senior/Staff Software Engineer - ML Infrastructure Join to apply for the

Senior/Staff Software Engineer - ML Infrastructure

role at

HiringAgents.ai .

Base pay range $200,000.00/yr - $250,000.00/yr

About The Role Industrial labor is incredibly dangerous work – almost 3 million people in the US per year are injured in the workplace for entirely preventable and at times, fatal or debilitating causes. Protecting these essential people who power our world is what motivates Voxel, and Voxel would love for you to join them.

Voxel is transforming workplace safety and operations with a full‑stack AI and computer vision platform that powers site intelligence for leading enterprises across grocery and retail, manufacturing, warehousing, supply chain, and logistics. Based in San Francisco and backed by top‑tier VCs, Voxel’s technology helps safety and operations leaders see unseen risks, make better decisions, and prevent incidents before they happen.

As a Staff Machine Learning Infrastructure Engineer, you will own three core pillars of Voxel’s computer‑vision platform: ground‑truth data and labeling workflows, large‑scale training infrastructure, and continuous model lifecycle management. You’ll design and operate cloud‑native, distributed systems that turn raw video into production‑ready, version‑controlled models. You’ll work closely with ML researchers and engineers, providing technical leadership and building the infrastructure that lets them iterate quickly, safely, and at scale.

Responsibilities

Own data and labeling pipelines: architect scalable labeling services (storage, query, retrieval), design ontologies, automate annotation workflows, and build quality‑tiered datasets within cost constraints

Build and operate training infrastructure: create multi‑GPU/multi‑node training frameworks (e.g., Ray, Spark, Kubernetes), optimize distributed jobs, and integrate accelerators (TensorRT, CUDA‑graph, FP8, etc.)

Manage the full model lifecycle: implement model registries, version control, evaluation suites, and continuous‑learning loops to push updates from dev → staging → prod with safe rollbacks

Provide technical leadership, mentorship, and lightweight project management for a small infra + research squad

Establish DevOps‑for‑ML best practices (IaC, CI/CD, observability, cost monitoring) and partner with ML engineers on architecture decisions from data schemas to inference optimizations

Requirements

Must be based in the San Francisco Bay Area, California, United States, with the ability to work on‑site at Voxel’s San Francisco office

At least 5+ years of professional experience building and operating large‑scale infrastructure, including a minimum of 3+ years focused on ML or other data‑intensive systems

Bachelor’s degree or higher in Computer Science, Electrical Engineering, or a closely related technical field

Hands‑on experience designing and operating highly available, distributed systems on Kubernetes (e.g., EKS, GKE, or on‑prem clusters)

Practical experience with ML or data infrastructure, including automating data‑labeling or ground‑truth workflows and maintaining dataset versioning

Practical experience with modern DevOps for ML, including infrastructure‑as‑code (e.g., Terraform or AWS CDK), CI/CD pipelines (e.g., GitHub Actions or ArgoCD), and metrics/alerting tooling (e.g., Prometheus and Grafana)

Preferred Skills

Experience running multi‑instance or multi‑GPU training jobs and applying mixed‑precision optimizations or TensorRT/Triton inference

Background with model registry tooling (e.g., MLflow, BentoML, or SageMaker Model Registry) and associated evaluation dashboards

Prior work with computer‑vision models (e.g., YOLO, DETR, Faster R‑CNN) or video understanding systems at scale

Experience shipping high‑quality production code in Python in ML or infrastructure‑heavy environments

Familiarity with active‑learning, continuous‑training, or online distillation pipelines

Exposure to edge deployment or real‑time inference systems

Why join Voxel? Join a visionary team revolutionizing safety and operations, directly impacting the well‑being of millions of essential workers. This is your opportunity to build an extraordinary business and foster a vibrant company culture that demands your absolute best. You’ll work alongside AI experts, experienced entrepreneurs, and passionate problem‑solvers, playing a pivotal role in shaping Voxel’s growth trajectory and market position.

Voxel Offers

Extensive health, dental, and vision insurance

Highly competitive paid parental leave and support

Ownership through an Equity Incentive Plan

Generous paid time off and flexible work arrangements

Daily meals in‑office, vibrant company events, and team‑building

401(k) retirement plan, HSA options, and pre‑tax commuter benefits

#J-18808-Ljbffr