Field AI
AI/ML Engineer - Multimodal (Mid-level/Senior)
Field AI, Irvine, California, United States, 92713
Who are We?
Field AI is transforming how robots interact with the real world.
We are building risk-aware, reliable, and field-ready AI systems that address the most complex challenges in robotics, unlocking the full potential of embodied intelligence. We go beyond typical data-driven approaches or pure transformer-based architectures, and are charting a new course, with already-globally-deployed solutions delivering real-world results and rapidly improving models through real-field applications. Learn more at https://fieldai.com.
About the Job
Our
Field Foundation Model (FFM)
powers a global fleet of autonomous robots that capture massive streams of multimodal data across diverse, dynamic environments every day. As part of the
Insight Team
our mission is to transform this raw, multimodal data into actionable insights that empower our customers and engineers to deliver value.
Field-insight Foundation Model (FiFM)
is at the core of how we transform multimodal data from autonomous robots into actionable insights. As an
AI/ML Engineer
on the FiFM team, you will drive
research and model development
for one of Field AI's most ambitious initiatives. Your work will span
computer vision, vision-language models (VLMs), multimodal scene understanding, and long-memory video analysis and search , with a strong emphasis on
agentic AI
(tool use, memory, multimodal retrieval-augmented generation).This is a
full-cycle ML role : you'll curate datasets, fine-tune and evaluate models, optimize inference, and deploy them into production. It's a blend of
applied research and engineering , requiring creativity, rapid experimentation, and rigorous problem-solving. While FiFM is your primary focus, you'll also contribute to broader
perception and insight-generation initiatives
across Field AI.
What You'll Get To Do: Train and fine-tune
million- to billion-parameter
multimodal models , with a focus on
computer vision ,
video understanding , and
vision-language integration . Track state-of-the-art research , adapt novel algorithms, and integrate them into FiFM. Curate datasets
and
develop tools
to improve
model interpretability . Build scalable evaluation pipelines
for
vision
and
multimodal models . Contribute
to
model observability ,
drift detection , and
error classification . Fine-tune
and
optimize
open-source
VLMs
and
multimodal embedding models
for efficiency and robustness. Build and optimize
Multi-VectorRAG pipelines
with
vector DBs and knowledge graphs . Create
embedding-based memory and retrieval chains
with token-efficient chunking strategies. What You Have:
Master's/Ph.D. in Computer Science, AI/ML, Robotics, or equivalent industry experience. 2+ years of industry experience or relevant publications in CV/ML/AI. Strong expertise in
computer vision, video understanding, temporal modeling, and VLMs . Proficiency in
Python and PyTorch
with production-level coding skills. Experience building
pipelines for large-scale video/image datasets . Familiarity with
AWS or other cloud platforms
for ML training and deployment. Understanding of
MLOps best practices
(CI/CD, experiment tracking). Hands-on experience fine-tuning open-source
multimodal models
using HuggingFace, DeepSpeed, vLLM, FSDP, LoRA/QLoRA. Knowledge of
precision tradeoffs
(FP16, bfloat16, quantization) and
multi-GPU optimization . Ability to
design scalable evaluation pipelines
for vision/VLMs and agent performance. The Extras That Set You Apart:
Experience with
Agentic/RAG pipelines and knowledge graphs
(LangChain, LangGraph, LlamaIndex, OpenSearch, FAISS, Pinecone). Familiarity with
agent operations logging and evaluation frameworks . Background in
optimization : token cost reduction, chunking strategies, reranking, and retrieval latency tuning. Experience deploying models under
quantized (int4/int8) and distributed multi-GPU inference . Exposure to
open-vocabulary detection, zero/few-shot learning, multimodal RAG . Knowledge of
temporal-spatial modeling
(event/scene graphs). Experience deploying AI in
edge or resource-constrained environments .
Compensation and Benefits
Our salary range is generous ($70,000 - $200,000 annual), but we take into consideration an individual's background and experience in determining final salary; base pay offered may vary considerably depending on geographic location, job-related knowledge, skills, and experience. Also, while we enjoy being together on-site, we are open to exploring a hybrid or remote option.
Why Join Field AI?
We are solving one of the world's most complex challenges: deploying robots in unstructured, previously unknown environments. Our Field Foundational Models™ set a new standard in perception, planning, localization, and manipulation, ensuring our approach is explainable and safe for deployment.
You will have the opportunity to work with a world-class team that thrives on creativity, resilience, and bold thinking. With a decade-long track record of deploying solutions in the field, winning DARPA challenge segments, and bringing expertise from organizations like DeepMind, NASA JPL, Boston Dynamics, NVIDIA, Amazon, Tesla Autopilot, Cruise Self-Driving, Zoox, Toyota Research Institute, and SpaceX, we are set to achieve our ambitious goals.
Be Part of the Next Robotics Revolution
To tackle such ambitious challenges, we need a team as unique as our vision - innovators who go beyond conventional methods and are eager to tackle tough, uncharted questions. We're seeking individuals who challenge the status quo, dive into uncharted territory, and bring interdisciplinary expertise. Our team requires not only top AI talent but also exceptional software developers, engineers, product designers, field deployment experts, and communicators.
We are headquartered in always-sunny Mission Viejo (Irvine adjacent), Southern California and have US based and global teammates.
Join us, shape the future, and be part of a fun, close-knit team on an exciting journey!
Field AI is transforming how robots interact with the real world.
We are building risk-aware, reliable, and field-ready AI systems that address the most complex challenges in robotics, unlocking the full potential of embodied intelligence. We go beyond typical data-driven approaches or pure transformer-based architectures, and are charting a new course, with already-globally-deployed solutions delivering real-world results and rapidly improving models through real-field applications. Learn more at https://fieldai.com.
About the Job
Our
Field Foundation Model (FFM)
powers a global fleet of autonomous robots that capture massive streams of multimodal data across diverse, dynamic environments every day. As part of the
Insight Team
our mission is to transform this raw, multimodal data into actionable insights that empower our customers and engineers to deliver value.
Field-insight Foundation Model (FiFM)
is at the core of how we transform multimodal data from autonomous robots into actionable insights. As an
AI/ML Engineer
on the FiFM team, you will drive
research and model development
for one of Field AI's most ambitious initiatives. Your work will span
computer vision, vision-language models (VLMs), multimodal scene understanding, and long-memory video analysis and search , with a strong emphasis on
agentic AI
(tool use, memory, multimodal retrieval-augmented generation).This is a
full-cycle ML role : you'll curate datasets, fine-tune and evaluate models, optimize inference, and deploy them into production. It's a blend of
applied research and engineering , requiring creativity, rapid experimentation, and rigorous problem-solving. While FiFM is your primary focus, you'll also contribute to broader
perception and insight-generation initiatives
across Field AI.
What You'll Get To Do: Train and fine-tune
million- to billion-parameter
multimodal models , with a focus on
computer vision ,
video understanding , and
vision-language integration . Track state-of-the-art research , adapt novel algorithms, and integrate them into FiFM. Curate datasets
and
develop tools
to improve
model interpretability . Build scalable evaluation pipelines
for
vision
and
multimodal models . Contribute
to
model observability ,
drift detection , and
error classification . Fine-tune
and
optimize
open-source
VLMs
and
multimodal embedding models
for efficiency and robustness. Build and optimize
Multi-VectorRAG pipelines
with
vector DBs and knowledge graphs . Create
embedding-based memory and retrieval chains
with token-efficient chunking strategies. What You Have:
Master's/Ph.D. in Computer Science, AI/ML, Robotics, or equivalent industry experience. 2+ years of industry experience or relevant publications in CV/ML/AI. Strong expertise in
computer vision, video understanding, temporal modeling, and VLMs . Proficiency in
Python and PyTorch
with production-level coding skills. Experience building
pipelines for large-scale video/image datasets . Familiarity with
AWS or other cloud platforms
for ML training and deployment. Understanding of
MLOps best practices
(CI/CD, experiment tracking). Hands-on experience fine-tuning open-source
multimodal models
using HuggingFace, DeepSpeed, vLLM, FSDP, LoRA/QLoRA. Knowledge of
precision tradeoffs
(FP16, bfloat16, quantization) and
multi-GPU optimization . Ability to
design scalable evaluation pipelines
for vision/VLMs and agent performance. The Extras That Set You Apart:
Experience with
Agentic/RAG pipelines and knowledge graphs
(LangChain, LangGraph, LlamaIndex, OpenSearch, FAISS, Pinecone). Familiarity with
agent operations logging and evaluation frameworks . Background in
optimization : token cost reduction, chunking strategies, reranking, and retrieval latency tuning. Experience deploying models under
quantized (int4/int8) and distributed multi-GPU inference . Exposure to
open-vocabulary detection, zero/few-shot learning, multimodal RAG . Knowledge of
temporal-spatial modeling
(event/scene graphs). Experience deploying AI in
edge or resource-constrained environments .
Compensation and Benefits
Our salary range is generous ($70,000 - $200,000 annual), but we take into consideration an individual's background and experience in determining final salary; base pay offered may vary considerably depending on geographic location, job-related knowledge, skills, and experience. Also, while we enjoy being together on-site, we are open to exploring a hybrid or remote option.
Why Join Field AI?
We are solving one of the world's most complex challenges: deploying robots in unstructured, previously unknown environments. Our Field Foundational Models™ set a new standard in perception, planning, localization, and manipulation, ensuring our approach is explainable and safe for deployment.
You will have the opportunity to work with a world-class team that thrives on creativity, resilience, and bold thinking. With a decade-long track record of deploying solutions in the field, winning DARPA challenge segments, and bringing expertise from organizations like DeepMind, NASA JPL, Boston Dynamics, NVIDIA, Amazon, Tesla Autopilot, Cruise Self-Driving, Zoox, Toyota Research Institute, and SpaceX, we are set to achieve our ambitious goals.
Be Part of the Next Robotics Revolution
To tackle such ambitious challenges, we need a team as unique as our vision - innovators who go beyond conventional methods and are eager to tackle tough, uncharted questions. We're seeking individuals who challenge the status quo, dive into uncharted territory, and bring interdisciplinary expertise. Our team requires not only top AI talent but also exceptional software developers, engineers, product designers, field deployment experts, and communicators.
We are headquartered in always-sunny Mission Viejo (Irvine adjacent), Southern California and have US based and global teammates.
Join us, shape the future, and be part of a fun, close-knit team on an exciting journey!