Fabrion

DevOps Engineer (Founding Team)

Fabrion, San Francisco, California, United States, 94199

Join to apply for the

DevOps Engineer (Founding Team)

role at

Fabrion

Location:

San Francisco Bay Area

Type:

Full-Time

Compensation:

Competitive salary + meaningful equity (founding tier)

Backed by 8VC, we’re building a world‑class team to tackle one of the industry’s most critical infrastructure problems.

About The Role We’re building an AI‑native, multi‑tenant enterprise platform for complex domains in industrial verticals. In this architecture, DevOps isn’t just about shipping features — it’s about

operationalizing intelligent agents ,

ensuring traceability across AI systems , and supporting

mission‑critical ML infrastructure

at scale.

We’re looking for a

DevOps engineer

who can own infrastructure from Day 1 — automating everything from CI CD and observability to cloud governance and security. You’ll work with a highly technical team building real‑time AI pipelines and multi‑agent systems. If you want to be the person who makes the platform run — fast, secure, reliable, and explainable — this is your role.

Responsibilities

Build and maintain scalable cloud infrastructure across AWS, GCP, and Azure with a focus on secure, tenant‑isolated deployments

Own and evolve CI/CD systems (e.g. GitHub Actions, ArgoCD) with progressive rollout, testing, and rollback flows

Establish observability tooling across services, agents, and pipelines (OpenTelemetry, Prometheus, Grafana, Sentry)

Implement policy‑as‑code (OPA, Rego) for deployment safety, RBAC, audit logging, and approval workflows

Define and enforce SLAs, uptime targets (99.99%+) and incident response and remediation workflows

Secure infrastructure: IAM, VPC, encryption, key management, image scanning, and secrets rotation

Automate deployments, infrastructure provisioning (Terraform, Helm) and environment replication

What We’re Looking For Core Experience

4–10+ years in DevOps, platform engineering, or SRE in production‑grade systems

Strong experience with Docker, Kubernetes (EKS/GKE), Terraform or Pulumi

Hands‑on experience deploying and monitoring distributed cloud‑native systems

Familiarity with GitOps practices, CI/CD design, progressive delivery, and secure SDLC

Clear understanding of how to implement monitoring, alerting, and failure simulation in dynamic environments

Engineering Mindset

Obsessed with reliability, latency, uptime, and repeatability

Security‑aware and compliance‑conscious

Proactive — you don’t wait for alerts to fix things

Comfortable collaborating with backend, AI, and data teams

Bonus: Agent‑Native / ML Ops Capabilities

Experience running LLM orchestration frameworks (e.g. LangChain, LangGraph, Dust, ReAct agents)

Building retrieval‑augmented generation (RAG) pipelines — and deploying them safely and repeatably

Familiarity with vector DBs (Weaviate, Qdrant, Pinecone) and embedding pipelines

Monitoring and governing long‑running or multi‑agent chains

Auditability and replay systems for agent decision‑making

Serving fine‑tuned or open‑source LLMs with model versioning and GPU scaling (e.g. vLLM, TGI)

Interest in auto‑remediation using agents (e.g. observability + alert → insight → response via LLM)

Why This Role Matters DevOps is the nervous system of the platform — every agent, every data fabric component, every pipeline flows through what you build. This is a rare opportunity to design that system early, the right way, and future‑proof it for scale, compliance, and trust.

If you’re excited by intelligent systems, distributed data, and deeply technical infrastructure problems — and you want your work to have immediate real‑world impact — we’d love to hear from you.

#J-18808-Ljbffr