Socotra, Inc.
Staff / Principal Engineer – Core Engineering
Socotra, Inc., San Francisco, California, United States, 94199
Build the Future of Scalable AI at TrueFoundry
At
TrueFoundry , we’re redefining how ML teams train, deploy, and scale their models. Our LLMOps and MLOps platform empowers organizations to experiment faster, train large-scale models reliably, and deploy them seamlessly on Kubernetes—with the same muscle as Big Tech.
We're looking for an
Engineer
who is passionate about scaling deep learning workloads, optimizing multi-GPU training, and shipping production-grade solutions.
The Role:
We are seeking a
Staff / Principal Engineer
to join our
Core Engineering team
as a senior technical leader based in the United States. You will:
Solve some of the most complex Engineering problems and drive it alongside a team of engineers & ML researchers.
Build a
deep, holistic understanding
of the TrueFoundry platform across all components and shape the product vision and implementation.
Act as the
technical face of engineering
for customer-related discussions and escalations
Guide and
unblock engineers
across projects in the US region
Partner closely with our
CTO and India-based engineering team
to drive system design, architecture, and implementation of complex products
Lead
technical design ,
critical customer problem-solving , and
platform scalability initiatives
end-to-end
This is a
high-ownership ,
high-impact
role designed for an engineer who loves combining
world-class systems thinking
with
real-world execution .
What You’ll Do:
Develop deep expertise across
TrueFoundry’s platform stack
— infrastructure, deployment systems, LLM/ML orchestration, observability, cost optimization, and more
Drive the
system architecture and design
for complex, distributed, cloud-native systems
Act as the
technical point-of-contact
for enterprise customer engineering needs and escalations
Lead and participate in
design reviews, code reviews, and critical incident responses
Collaborate closely with the
CTO
on architectural decisions, scaling strategies, and technical roadmap prioritization
Guide and mentor
US-based engineers
across multiple initiatives, helping them deliver high-quality, scalable systems
Identify and drive
technical debt cleanup ,
performance improvements , and
resilience upgrades
across the platform
Bring a
product engineering mindset , ensuring that customer needs and feedback translate into scalable engineering solutions
Who You Are:
8+ years of
strong backend / systems engineering
experience at top technology companies or startups
Deep expertise in
distributed systems ,
cloud-native architectures , and
scalable system design
Strong working knowledge of
Kubernetes ,
containerized workloads , and
infrastructure engineering
Practical experience building or deploying
ML/GenAI applications
(or closely working with ML/DS teams)
Skilled in programming languages such as
Python ,
Go , or
typescript
Solid understanding of
system observability ,
resiliency design , and
SRE practices
Strong technical leadership and communication skills — able to work with both
customers
and
engineering teams
Ability to
think strategically
while also executing hands‑on when required
Bonus: Experience supporting enterprise deployments of
AI/ML infrastructure ,
model training , or
inference systems Why Join TrueFoundry?
Work directly with
ex‑Facebook engineers
and
founders from IIT Kharagpur, UC Berkeley, and Y Combinator alumni .
First‑hand exposure to building and scaling a
deep‑tech startup —insights you’ll carry if you want to start your own one day.
Be part of a
fearlessly experimental culture
focused on customer success and long‑term impact.
Flexible hours, learning credits, and the opportunity to work
shoulder‑to‑shoulder with the co‑founders
(Abhishek & Nikunj).
#J-18808-Ljbffr
TrueFoundry , we’re redefining how ML teams train, deploy, and scale their models. Our LLMOps and MLOps platform empowers organizations to experiment faster, train large-scale models reliably, and deploy them seamlessly on Kubernetes—with the same muscle as Big Tech.
We're looking for an
Engineer
who is passionate about scaling deep learning workloads, optimizing multi-GPU training, and shipping production-grade solutions.
The Role:
We are seeking a
Staff / Principal Engineer
to join our
Core Engineering team
as a senior technical leader based in the United States. You will:
Solve some of the most complex Engineering problems and drive it alongside a team of engineers & ML researchers.
Build a
deep, holistic understanding
of the TrueFoundry platform across all components and shape the product vision and implementation.
Act as the
technical face of engineering
for customer-related discussions and escalations
Guide and
unblock engineers
across projects in the US region
Partner closely with our
CTO and India-based engineering team
to drive system design, architecture, and implementation of complex products
Lead
technical design ,
critical customer problem-solving , and
platform scalability initiatives
end-to-end
This is a
high-ownership ,
high-impact
role designed for an engineer who loves combining
world-class systems thinking
with
real-world execution .
What You’ll Do:
Develop deep expertise across
TrueFoundry’s platform stack
— infrastructure, deployment systems, LLM/ML orchestration, observability, cost optimization, and more
Drive the
system architecture and design
for complex, distributed, cloud-native systems
Act as the
technical point-of-contact
for enterprise customer engineering needs and escalations
Lead and participate in
design reviews, code reviews, and critical incident responses
Collaborate closely with the
CTO
on architectural decisions, scaling strategies, and technical roadmap prioritization
Guide and mentor
US-based engineers
across multiple initiatives, helping them deliver high-quality, scalable systems
Identify and drive
technical debt cleanup ,
performance improvements , and
resilience upgrades
across the platform
Bring a
product engineering mindset , ensuring that customer needs and feedback translate into scalable engineering solutions
Who You Are:
8+ years of
strong backend / systems engineering
experience at top technology companies or startups
Deep expertise in
distributed systems ,
cloud-native architectures , and
scalable system design
Strong working knowledge of
Kubernetes ,
containerized workloads , and
infrastructure engineering
Practical experience building or deploying
ML/GenAI applications
(or closely working with ML/DS teams)
Skilled in programming languages such as
Python ,
Go , or
typescript
Solid understanding of
system observability ,
resiliency design , and
SRE practices
Strong technical leadership and communication skills — able to work with both
customers
and
engineering teams
Ability to
think strategically
while also executing hands‑on when required
Bonus: Experience supporting enterprise deployments of
AI/ML infrastructure ,
model training , or
inference systems Why Join TrueFoundry?
Work directly with
ex‑Facebook engineers
and
founders from IIT Kharagpur, UC Berkeley, and Y Combinator alumni .
First‑hand exposure to building and scaling a
deep‑tech startup —insights you’ll carry if you want to start your own one day.
Be part of a
fearlessly experimental culture
focused on customer success and long‑term impact.
Flexible hours, learning credits, and the opportunity to work
shoulder‑to‑shoulder with the co‑founders
(Abhishek & Nikunj).
#J-18808-Ljbffr