HireOTS
AI Infrastructure Engineer - PlayerZero
HireOTS, San Francisco, California, United States, 94199
A
stealth-stage AI infrastructure company
is building a
self-healing system for software
that automates defect resolution and development. The platform is used by engineering and support teams to: Autonomously debug problems in production software
Fix issues directly in the codebase
Prevent recurring issues through intelligent root-cause automation
The company is backed by top-tier investors such as
Foundation Capital, WndrCo, and Green Bay Ventures , as well as prominent operators including
Matei Zaharia, Drew Houston, Dylan Field, Guillermo Rauch , and others. We believe that as software development accelerates, the burden of maintaining quality and reliability shifts heavily onto engineering and support teams. This challenge creates a rare opportunity to
reimagine how software is supported and sustained —with AI-powered systems that respond autonomously. About the Role
We’re looking for an experienced
backend/infrastructure engineer
who thrives at the intersection of systems and AI — and who loves turning research prototypes into rock-solid production services. You’ll design and scale the core backend that powers our AI inference stack — from ingestion pipelines and feature stores to GPU orchestration and vector search. If you care deeply about
performance, correctness, observability, and fast iteration , you’ll fit right in. What You’ll Do
Own mission-critical services
end-to-end — from architecture and design reviews to deployment, observability, and service-level objectives.
Scale LLM-driven systems : build RAG pipelines, vector indexes, and evaluation frameworks handling billions of events per day.
Design data-heavy backends : streaming ETL, columnar storage, time-series analytics — all fueling the self-healing loop.
Optimize for cost and latency
across compute types (CPUs, GPUs, serverless); profile hot paths and squeeze out milliseconds.
Drive reliability : implement automated testing, chaos engineering, and progressive rollout strategies for new models.
Work cross-functionally
with ML researchers, product engineers, and real customers to build infrastructure that actually matters.
You Might Thrive in This Role If You:
Have
2–5+ years
of experience building scalable backend or infra systems in production environments
Bring a
builder mindset
— you like owning projects end-to-end and thinking deeply about data, scale, and maintainability
Have transitioned
ML or data-heavy prototypes to production , balancing speed and robustness
Are comfortable with
data engineering workflows : parsing, transforming, indexing, and querying structured or unstructured data
Have some exposure to
search infrastructure
or
LLM-backed systems
(e.g., document retrieval, RAG, semantic search)
Bonus Points
Experience with
vector databases
(e.g., pgvector, Pinecone, Weaviate) or inverted-index search (e.g., Elasticsearch, Lucene)
Hands-on with
GPU orchestration
(Kubernetes, Ray, KServe) or model-parallel inference tuning
Familiarity with
Go / Rust
(primary stack), with some TypeScript for light full-stack tasks
Deep knowledge of
observability tooling
(OpenTelemetry, Grafana, Datadog) and profiling distributed systems
Contributions to open-source
ML or systems infrastructure projects
Let me know if you’d like a version optimized for careers pages, job boards, or stealth pitch decks.
#J-18808-Ljbffr
stealth-stage AI infrastructure company
is building a
self-healing system for software
that automates defect resolution and development. The platform is used by engineering and support teams to: Autonomously debug problems in production software
Fix issues directly in the codebase
Prevent recurring issues through intelligent root-cause automation
The company is backed by top-tier investors such as
Foundation Capital, WndrCo, and Green Bay Ventures , as well as prominent operators including
Matei Zaharia, Drew Houston, Dylan Field, Guillermo Rauch , and others. We believe that as software development accelerates, the burden of maintaining quality and reliability shifts heavily onto engineering and support teams. This challenge creates a rare opportunity to
reimagine how software is supported and sustained —with AI-powered systems that respond autonomously. About the Role
We’re looking for an experienced
backend/infrastructure engineer
who thrives at the intersection of systems and AI — and who loves turning research prototypes into rock-solid production services. You’ll design and scale the core backend that powers our AI inference stack — from ingestion pipelines and feature stores to GPU orchestration and vector search. If you care deeply about
performance, correctness, observability, and fast iteration , you’ll fit right in. What You’ll Do
Own mission-critical services
end-to-end — from architecture and design reviews to deployment, observability, and service-level objectives.
Scale LLM-driven systems : build RAG pipelines, vector indexes, and evaluation frameworks handling billions of events per day.
Design data-heavy backends : streaming ETL, columnar storage, time-series analytics — all fueling the self-healing loop.
Optimize for cost and latency
across compute types (CPUs, GPUs, serverless); profile hot paths and squeeze out milliseconds.
Drive reliability : implement automated testing, chaos engineering, and progressive rollout strategies for new models.
Work cross-functionally
with ML researchers, product engineers, and real customers to build infrastructure that actually matters.
You Might Thrive in This Role If You:
Have
2–5+ years
of experience building scalable backend or infra systems in production environments
Bring a
builder mindset
— you like owning projects end-to-end and thinking deeply about data, scale, and maintainability
Have transitioned
ML or data-heavy prototypes to production , balancing speed and robustness
Are comfortable with
data engineering workflows : parsing, transforming, indexing, and querying structured or unstructured data
Have some exposure to
search infrastructure
or
LLM-backed systems
(e.g., document retrieval, RAG, semantic search)
Bonus Points
Experience with
vector databases
(e.g., pgvector, Pinecone, Weaviate) or inverted-index search (e.g., Elasticsearch, Lucene)
Hands-on with
GPU orchestration
(Kubernetes, Ray, KServe) or model-parallel inference tuning
Familiarity with
Go / Rust
(primary stack), with some TypeScript for light full-stack tasks
Deep knowledge of
observability tooling
(OpenTelemetry, Grafana, Datadog) and profiling distributed systems
Contributions to open-source
ML or systems infrastructure projects
Let me know if you’d like a version optimized for careers pages, job boards, or stealth pitch decks.
#J-18808-Ljbffr