Rethink recruit

Senior / Staff Site Reliability Engineer

Rethink recruit, San Francisco, California, United States, 94199

About Abridge

Abridge has built the most advanced AI platform for clinical conversations, trusted by over 100 major healthcare systems including Kaiser Permanente, Mayo Clinic, and CommonSpirit Health. We recently raised a

$250M Series D

(~$450M total funding to date) at a

$2.75B valuation , backed by leading investors such as Lightspeed, Redpoint, IVP, Spark Capital, and Elad Gil. As we scale rapidly across enterprise healthcare, we are expanding our Site Reliability Engineering function to ensure our systems remain fast, stable, secure, and capable of supporting hyperscale adoption in a mission‑critical environment. About the Role

Abridge is in hyperscale mode, and we are looking for highly experienced

Senior and Staff SREs

to dramatically improve the performance, stability, and scalability of our systems. This role is approximately

80% software‑focused

and

20% cloud infrastructure‑focused , with deep emphasis on distributed systems, system performance, and engineering velocity. You will introduce load testing and chaos engineering into CI pipelines, develop profiling‑driven performance improvements, move applications onto more scalable infrastructure, and work closely with multiple engineering teams—sometimes embedded for extended periods—to deliver measurable impact. This is a 0→1, high‑autonomy opportunity to shape a next‑generation platform that powers real‑world healthcare workflows at massive scale. What You’ll Do

Use

load testing, chaos engineering, and profiling tools

to identify performance and latency bottlenecks, making direct changes to application code.

Drive software‑level changes to rehome applications onto new infrastructure—new runtimes, databases, event‑driven architectures, and more—to dramatically increase scalability and support multi‑tenant deployments.

Implement software configuration and tuning improvements that meaningfully boost performance and resilience.

Build

developer tools, modules, and enablements

used across the entire engineering organization.

Partner with Platform Engineering to build and roll out new internal developer platform components (service templates, self‑serve infra, etc.).

Help application teams define and adopt

SLOs, error budgets, health metrics , automated canary releases, and improved operational practices.

Strengthen Abridge’s incident‑response readiness by improving observability, runbooks, and organizational response patterns.

Document, train, and evangelize cloud‑native design strategies, tools, and best practices across the engineering org.

Represent Abridge externally as a technical evangelist in the platform engineering community through conferences, OSS contributions, and research.

What We’re Looking For

Experience Requirements

8–10+ years

total engineering experience

3+ years

in DevOps, Infra, Systems, Platform Engineering, or SRE roles

Must Have

Kubernetes & Terraform experience

Production application deployment experience

CI/CD pipeline work

Monitoring experience (Grafana, Datadog)

Hands‑on GCP experience (1+ years)

Ability and willingness to do

IC work

Comfortable participating in an

on‑call rotation

(one week every 6–7 weeks)

Nice to Have

Python, Go, or Node.js experience

Multi‑cloud experience

FedRAMP or PCI compliance exposure

Experience working with or contracting for the VA

Growth‑stage startup experience (combined with Big Tech experience is great)

Other Requirements

Hybrid role: 3+ days/week onsite in

NYC or SF

Must be able to start within

2 weeks of offer

#J-18808-Ljbffr