Pear VC

Member of Technical Staff, Machine Learning

Pear VC, Austin, Texas, us, 78716

About NomadicML

Americans drive over 5 trillion miles a year, more than 500 billion of them recorded. Buried in that footage is the next frontier of machine intelligence. At NomadicML, we’re building the platform that unlocks it. Our Vision-Language Models (VLMs) act as the new “hydraulic mining” for video, transforming raw footage into structured intelligence that powers real-world autonomy and robotics. We partner with industry leaders across self-driving, robotics, and industrial automation to mine insights from petabytes of data that were once unusable. NomadicML was founded by

Mustafa Bal

and

Varun Krishnan , who met at

Harvard University

while studying Computer Science. Mustafa

is a core contributor to

ONNX Runtime

and

DeepSpeed

with deep expertise in distributed systems and large-scale model training infrastructure

Varun

is an INFORMS Wagner Prize Finalist for his research in large-scale driver navigation AI models and one of the top chess players in the US.

Our team has built mission-critical AI systems at

Snowflake, Lyft, Microsoft, Amazon, and IBM Research , holds top-tier publications in VLMS and AI at conferences like

CVPR , and moves with the speed and clarity of a startup obsessed with impact. About the Role

We’re seeking a

Machine Learning Engineer

who thrives at the frontier of

foundation-model research and production engineering . You’ll help define how machines learn from motion: training and fine-tuning large-scale

Vision-Language Models

to reason about complex, real-world video. Your work will involve building multi-modal architectures that perceive, localize, and describe motion events (turns, lane changes, interactions, anomalies) across millions of frames, and turning those breakthroughs into robust APIs and SDKs used by enterprise customers. You’ll work directly with the founders to: Train and evaluate

VLMs specialized for motion understanding

in autonomous-driving and robotics datasets.

Design and scale

GPU-accelerated pipelines

for training, fine-tuning, and inference on multi-modal data (video + language + sensor metadata).

Build

agentic evaluation frameworks

that benchmark spatiotemporal reasoning, localization accuracy, and narrative consistency.

Develop and productionize

curation loops

that use our own models to generate and refine datasets (“AI training AI”).

Publish high-impact research (e.g., NeurIPS, CVPR) while shipping features that customers use immediately.

You’ll Excel If You Have

Strong proficiency in

Python ,

PyTorch , and large-scale ML workflows.

Research experience in

foundation models, VLMs, or multi-modal learning

(publications/patents a plus).

Ability to iterate

quickly and autonomously , running experiments end-to-end.

Experience training or fine-tuning models on

video or sensor data .

Understanding of

retrieval systems, embeddings, and GPU optimization .

Nice to Have

Contributions to open-source ML frameworks (e.g., DeepSpeed, Hugging Face).

Experience with

vector databases ,

distributed training , or

ML orchestration systems

(e.g., Ray, Kubeflow, MLflow).

Prior exposure to

autonomous-driving or robotics

datasets.

#J-18808-Ljbffr