UniversalAGI
Founding ML Cloud Infrastructure Engineer
UniversalAGI, San Francisco, California, United States, 94199
Join the
Founding ML Cloud Infrastructure Engineer
role at
UniversalAGI
San Francisco | Work directly with the CEO & founding team | 5 days onsite per week | Competitive salary + equity.
About the Company UniversalAGI is building the OpenAI for Physics: high‑velocity AI models that drive end‑to‑end industrial automation from design to production. Backed by leaders such as Elad Gil, Eric Schmidt, Prith Banerjee, Ion Stoica, Jared Kushner, David Patterson and Luis Videgaray, we empower breakthrough physics simulations with foundation AI models.
About the Role As a founding ML Cloud Infrastructure Engineer you will design and own the entire ML infrastructure stack—fine‑tuning, training, model serving, and deployment—at scale.
What You'll Do
Build and scale fine‑tuning & training infrastructure
for foundation models, optimizing throughput, cost, and iteration speed across multi‑GPU and multi‑node clusters.
Design and implement model serving systems
with low latency, high reliability, and the ability to handle complex physics workloads in production.
Build fine‑tuning pipelines
that let customers adapt foundation models to their specific use cases, data, and workflows.
Build deployment serving infrastructure
for on‑premise and cloud environments, meeting customer security and compliance constraints.
Create robust data pipelines
that ingest, validate, and preprocess massive CFD datasets from diverse sources.
Instrument everything : Build observability, monitoring, and debugging tools that give team and customers full visibility into model performance, data quality, and system health.
Work directly with customers
on deployment, integration, and scaling challenges, turning pain points into product improvements.
Move fast and ship : Take infrastructure from prototype to production in weeks, iterating on real customer needs.
Qualifications
3+ years of hands‑on ML infrastructure experience
(training, serving, deployment).
Deep cloud platform expertise
(AWS, GCP, Azure) and infrastructure‑as‑code (Terraform, Kubernetes, Docker).
Distributed training expertise
using PyTorch Distributed, DeepSpeed, Ray, etc.
Strong foundation in ML serving
(low‑latency inference, model optimization).
Expert‑level Python coding
and familiarity with ML frameworks.
Understanding of ML workflows
from training pipelines to production deployment.
Excellent communication
to bridge customers, engineers, and researchers.
Outstanding execution velocity
and problem‑solving skills.
Comfort in high‑intensity startup environments
with evolving priorities.
Bonus Qualifications
Experience with CAD/CAM software.
Deploying ML in enterprise environments with strict security and compliance.
Model optimization techniques.
GPU programming and performance optimization (CUDA, Triton).
Large‑scale data engineering for ML, ETL pipelines, and data validation.
Building MLOps platforms or developer tools for ML teams.
Experience at high‑growth AI startups or leading AI labs.
Direct customer integration experience.
Open‑source contributions to ML infrastructure or training frameworks.
Cultural Fit
Technically respectful and hands‑on.
Intense, willing to grind when needed.
Customer‑obsessed, solving real problems.
Values deep work over meetings.
Ready for high availability whenever critical issues arise.
Clear communication to customers and team.
Growth mindset and continuous learning.
Startup mindset: ambiguity, rapid change, wearing multiple hats.
Strong work ethic and extra‑hours when needed.
Collaboration with low ego and high accountability.
What We Offer
Opportunity to shape the technical foundation of a rapidly growing AI company.
Work on industrial AI problems with immediate real‑world impact.
Direct collaboration with the founder & CEO.
Competitive compensation + significant equity upside.
5‑day onsite culture.
Access to world‑class investors and advisors.
Health, dental, vision benefits paid by the company.
401(k) plan.
Flexible vacation.
AI tools stipend, monthly commute stipend, monthly wellness/fitness stipend.
Daily office lunch & dinner covered.
Immigration support.
“The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood; who strives valiantly; who errs, who comes short again and again… who at the best knows in the end the triumph of high achievement, and who at the worst, if he fails, at least fails while daring greatly.” – Teddy Roosevelt
At our core, we believe in being “in the arena.” We are builders, problem solvers, and risk‑takers who show up every day ready to put in the work: to sweat, to struggle, and to push past our limits. Real progress comes with missteps, iteration, and resilience. We embrace that journey fully, knowing that daring greatly is the only way to create something truly meaningful.
If you're ready to join the future of physics simulation, to push creative boundaries, and to deliver impact, UniversalAGI is the place for you.
#J-18808-Ljbffr
Founding ML Cloud Infrastructure Engineer
role at
UniversalAGI
San Francisco | Work directly with the CEO & founding team | 5 days onsite per week | Competitive salary + equity.
About the Company UniversalAGI is building the OpenAI for Physics: high‑velocity AI models that drive end‑to‑end industrial automation from design to production. Backed by leaders such as Elad Gil, Eric Schmidt, Prith Banerjee, Ion Stoica, Jared Kushner, David Patterson and Luis Videgaray, we empower breakthrough physics simulations with foundation AI models.
About the Role As a founding ML Cloud Infrastructure Engineer you will design and own the entire ML infrastructure stack—fine‑tuning, training, model serving, and deployment—at scale.
What You'll Do
Build and scale fine‑tuning & training infrastructure
for foundation models, optimizing throughput, cost, and iteration speed across multi‑GPU and multi‑node clusters.
Design and implement model serving systems
with low latency, high reliability, and the ability to handle complex physics workloads in production.
Build fine‑tuning pipelines
that let customers adapt foundation models to their specific use cases, data, and workflows.
Build deployment serving infrastructure
for on‑premise and cloud environments, meeting customer security and compliance constraints.
Create robust data pipelines
that ingest, validate, and preprocess massive CFD datasets from diverse sources.
Instrument everything : Build observability, monitoring, and debugging tools that give team and customers full visibility into model performance, data quality, and system health.
Work directly with customers
on deployment, integration, and scaling challenges, turning pain points into product improvements.
Move fast and ship : Take infrastructure from prototype to production in weeks, iterating on real customer needs.
Qualifications
3+ years of hands‑on ML infrastructure experience
(training, serving, deployment).
Deep cloud platform expertise
(AWS, GCP, Azure) and infrastructure‑as‑code (Terraform, Kubernetes, Docker).
Distributed training expertise
using PyTorch Distributed, DeepSpeed, Ray, etc.
Strong foundation in ML serving
(low‑latency inference, model optimization).
Expert‑level Python coding
and familiarity with ML frameworks.
Understanding of ML workflows
from training pipelines to production deployment.
Excellent communication
to bridge customers, engineers, and researchers.
Outstanding execution velocity
and problem‑solving skills.
Comfort in high‑intensity startup environments
with evolving priorities.
Bonus Qualifications
Experience with CAD/CAM software.
Deploying ML in enterprise environments with strict security and compliance.
Model optimization techniques.
GPU programming and performance optimization (CUDA, Triton).
Large‑scale data engineering for ML, ETL pipelines, and data validation.
Building MLOps platforms or developer tools for ML teams.
Experience at high‑growth AI startups or leading AI labs.
Direct customer integration experience.
Open‑source contributions to ML infrastructure or training frameworks.
Cultural Fit
Technically respectful and hands‑on.
Intense, willing to grind when needed.
Customer‑obsessed, solving real problems.
Values deep work over meetings.
Ready for high availability whenever critical issues arise.
Clear communication to customers and team.
Growth mindset and continuous learning.
Startup mindset: ambiguity, rapid change, wearing multiple hats.
Strong work ethic and extra‑hours when needed.
Collaboration with low ego and high accountability.
What We Offer
Opportunity to shape the technical foundation of a rapidly growing AI company.
Work on industrial AI problems with immediate real‑world impact.
Direct collaboration with the founder & CEO.
Competitive compensation + significant equity upside.
5‑day onsite culture.
Access to world‑class investors and advisors.
Health, dental, vision benefits paid by the company.
401(k) plan.
Flexible vacation.
AI tools stipend, monthly commute stipend, monthly wellness/fitness stipend.
Daily office lunch & dinner covered.
Immigration support.
“The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood; who strives valiantly; who errs, who comes short again and again… who at the best knows in the end the triumph of high achievement, and who at the worst, if he fails, at least fails while daring greatly.” – Teddy Roosevelt
At our core, we believe in being “in the arena.” We are builders, problem solvers, and risk‑takers who show up every day ready to put in the work: to sweat, to struggle, and to push past our limits. Real progress comes with missteps, iteration, and resilience. We embrace that journey fully, knowing that daring greatly is the only way to create something truly meaningful.
If you're ready to join the future of physics simulation, to push creative boundaries, and to deliver impact, UniversalAGI is the place for you.
#J-18808-Ljbffr