Logo
Socotra, Inc.

Staff / Principal Engineer – Core Engineering

Socotra, Inc., San Francisco, California, United States, 94199

Save Job

Build the Future of Scalable AI at TrueFoundry At

TrueFoundry , we’re redefining how ML teams train, deploy, and scale their models. Our LLMOps and MLOps platform empowers organizations to experiment faster, train large-scale models reliably, and deploy them seamlessly on Kubernetes—with the same muscle as Big Tech.

We're looking for an

Engineer

who is passionate about scaling deep learning workloads, optimizing multi-GPU training, and shipping production-grade solutions.

The Role:

We are seeking a

Staff / Principal Engineer

to join our

Core Engineering team

as a senior technical leader based in the United States. You will:

Solve some of the most complex Engineering problems and drive it alongside a team of engineers & ML researchers.

Build a

deep, holistic understanding

of the TrueFoundry platform across all components and shape the product vision and implementation.

Act as the

technical face of engineering

for customer-related discussions and escalations

Guide and

unblock engineers

across projects in the US region

Partner closely with our

CTO and India-based engineering team

to drive system design, architecture, and implementation of complex products

Lead

technical design ,

critical customer problem-solving , and

platform scalability initiatives

end-to-end

This is a

high-ownership ,

high-impact

role designed for an engineer who loves combining

world-class systems thinking

with

real-world execution .

What You’ll Do:

Develop deep expertise across

TrueFoundry’s platform stack

— infrastructure, deployment systems, LLM/ML orchestration, observability, cost optimization, and more

Drive the

system architecture and design

for complex, distributed, cloud-native systems

Act as the

technical point-of-contact

for enterprise customer engineering needs and escalations

Lead and participate in

design reviews, code reviews, and critical incident responses

Collaborate closely with the

CTO

on architectural decisions, scaling strategies, and technical roadmap prioritization

Guide and mentor

US-based engineers

across multiple initiatives, helping them deliver high-quality, scalable systems

Identify and drive

technical debt cleanup ,

performance improvements , and

resilience upgrades

across the platform

Bring a

product engineering mindset , ensuring that customer needs and feedback translate into scalable engineering solutions

Who You Are:

8+ years of

strong backend / systems engineering

experience at top technology companies or startups

Deep expertise in

distributed systems ,

cloud-native architectures , and

scalable system design

Strong working knowledge of

Kubernetes ,

containerized workloads , and

infrastructure engineering

Practical experience building or deploying

ML/GenAI applications

(or closely working with ML/DS teams)

Skilled in programming languages such as

Python ,

Go , or

typescript

Solid understanding of

system observability ,

resiliency design , and

SRE practices

Strong technical leadership and communication skills — able to work with both

customers

and

engineering teams

Ability to

think strategically

while also executing hands‑on when required

Bonus: Experience supporting enterprise deployments of

AI/ML infrastructure ,

model training , or

inference systems Why Join TrueFoundry?

Work directly with

ex‑Facebook engineers

and

founders from IIT Kharagpur, UC Berkeley, and Y Combinator alumni .

First‑hand exposure to building and scaling a

deep‑tech startup —insights you’ll carry if you want to start your own one day.

Be part of a

fearlessly experimental culture

focused on customer success and long‑term impact.

Flexible hours, learning credits, and the opportunity to work

shoulder‑to‑shoulder with the co‑founders

(Abhishek & Nikunj).

#J-18808-Ljbffr