Backflip
Staff Machine Learning Data Engineer
Backflip, San Francisco, California, United States, 94199
About Backflip
Mechanical design, the work done in CAD, is the rate-limiter for progress in the physical world. However, there are only 2-4 million people on Earth who know how to CAD. But what if hundreds of millions could? What if creating something in the real world were as easy as imagining the use case, or sketching it on paper?
Backflip is
building a foundation model for mechanical design : unifying the world’s scattered engineering knowledge into an intelligent, end-to-end design environment. Our goal is to enable anyone to imagine a solution and hit “print.”
Founded by a
second-time CEO in the same space
(first company: Markforged), Backflip combines deep industry insight with breakthrough AI research. Backed by
a16z
and
NEA , we raised a
$30M Series A
and built a deeply technical, mission-driven team.
We’re building the AI foundation that tomorrow’s space elevators, nanobots, and spaceships will be built in.
If you’re excited to define the next generation of hard tech, come build it with us.
The Role We’re looking for a
Staff Machine Learning Data Engineer
to lead and build the data pipelines powering Backflip’s
foundation model for manufacturing and CAD .
You’ll design the systems, tools, and strategies that turn the world’s engineering knowledge - text, geometry, and design intent - into high-quality training data.
This is a core leadership role within the AI team, driving the
data architecture, augmentation, and evaluation
that underpin our model’s performance and evolution.
You’ll collaborate with Machine Learning Engineers to run data-driven experiments, analyze results, and deliver AI products that shape the future of the physical world.
What You’ll Do
Architect and own
Backflip’s ML data pipeline, from ingestion to processing to evaluation.
Define data strategy:
establish best practices for data augmentation, filtering, and sampling at scale.
Design scalable data systems
for multimodal training (text, geometry, CAD, and more).
Develop and automate
data collection, curation, and validation workflows.
Collaborate with MLEs
to design and execute experiments that measure and improve model performance.
Build tools and metrics
for dataset analysis, monitoring, and quality assurance.
Contribute to model development
through insights grounded in data, shaping what, how, and when we train.
Who You Are
You’ve
built and maintained ML data pipelines
at scale, ideally for foundation or generative models, that
shipped into production
in the real world.
You have deep experience with
data engineering for ML , including distributed systems, data extraction, transformation, and loading, and large-scale data processing (e.g. PySpark, Beam, Ray, or similar).
You’re fluent in
Python
and experienced with ML frameworks and data formats (Parquet, TFRecord, HuggingFace datasets, etc.).
You’ve developed
data augmentation, sampling, or curation strategies
that improved model performance.
You think like both an
engineer and an experimentalist : curious, analytical, and grounded in evidence.
You
collaborate
well across AI development, infra, and product, and enjoy building the data systems that make great models possible.
You care deeply about
data quality, reproducibility, and scalability .
You’re excited to
help shape the future of AI
for physical design.
Bonus points if:
You are comfortable working with a variety of
complex data formats , e.g. for 3D geometry kernels or rendering engines.
You have an interest in math,
geometry, topology, rendering , or computational geometry.
You’ve worked in
3D printing, CAD, or computer graphics
domains.
Why Backflip This is a rare opportunity to
own the data backbone
of a frontier foundation model, and help define how AI learns to design the physical world.
You’ll join a world-class, mission-driven team operating at the intersection of research, engineering, and deep product sense, building systems that let people design the physical world as easily as they imagine it.
Your work will directly shape the
performance, capability, and impact
of Backflip’s foundation model, the core of how the world will build in the future.
Let’s build the tools the future will be made in.
#J-18808-Ljbffr
Backflip is
building a foundation model for mechanical design : unifying the world’s scattered engineering knowledge into an intelligent, end-to-end design environment. Our goal is to enable anyone to imagine a solution and hit “print.”
Founded by a
second-time CEO in the same space
(first company: Markforged), Backflip combines deep industry insight with breakthrough AI research. Backed by
a16z
and
NEA , we raised a
$30M Series A
and built a deeply technical, mission-driven team.
We’re building the AI foundation that tomorrow’s space elevators, nanobots, and spaceships will be built in.
If you’re excited to define the next generation of hard tech, come build it with us.
The Role We’re looking for a
Staff Machine Learning Data Engineer
to lead and build the data pipelines powering Backflip’s
foundation model for manufacturing and CAD .
You’ll design the systems, tools, and strategies that turn the world’s engineering knowledge - text, geometry, and design intent - into high-quality training data.
This is a core leadership role within the AI team, driving the
data architecture, augmentation, and evaluation
that underpin our model’s performance and evolution.
You’ll collaborate with Machine Learning Engineers to run data-driven experiments, analyze results, and deliver AI products that shape the future of the physical world.
What You’ll Do
Architect and own
Backflip’s ML data pipeline, from ingestion to processing to evaluation.
Define data strategy:
establish best practices for data augmentation, filtering, and sampling at scale.
Design scalable data systems
for multimodal training (text, geometry, CAD, and more).
Develop and automate
data collection, curation, and validation workflows.
Collaborate with MLEs
to design and execute experiments that measure and improve model performance.
Build tools and metrics
for dataset analysis, monitoring, and quality assurance.
Contribute to model development
through insights grounded in data, shaping what, how, and when we train.
Who You Are
You’ve
built and maintained ML data pipelines
at scale, ideally for foundation or generative models, that
shipped into production
in the real world.
You have deep experience with
data engineering for ML , including distributed systems, data extraction, transformation, and loading, and large-scale data processing (e.g. PySpark, Beam, Ray, or similar).
You’re fluent in
Python
and experienced with ML frameworks and data formats (Parquet, TFRecord, HuggingFace datasets, etc.).
You’ve developed
data augmentation, sampling, or curation strategies
that improved model performance.
You think like both an
engineer and an experimentalist : curious, analytical, and grounded in evidence.
You
collaborate
well across AI development, infra, and product, and enjoy building the data systems that make great models possible.
You care deeply about
data quality, reproducibility, and scalability .
You’re excited to
help shape the future of AI
for physical design.
Bonus points if:
You are comfortable working with a variety of
complex data formats , e.g. for 3D geometry kernels or rendering engines.
You have an interest in math,
geometry, topology, rendering , or computational geometry.
You’ve worked in
3D printing, CAD, or computer graphics
domains.
Why Backflip This is a rare opportunity to
own the data backbone
of a frontier foundation model, and help define how AI learns to design the physical world.
You’ll join a world-class, mission-driven team operating at the intersection of research, engineering, and deep product sense, building systems that let people design the physical world as easily as they imagine it.
Your work will directly shape the
performance, capability, and impact
of Backflip’s foundation model, the core of how the world will build in the future.
Let’s build the tools the future will be made in.
#J-18808-Ljbffr