Rethink recruit
Senior / Staff Site Reliability Engineer
Rethink recruit, San Francisco, California, United States, 94199
About Abridge
Abridge has built the most advanced AI platform for clinical conversations, trusted by over 100 major healthcare systems including Kaiser Permanente, Mayo Clinic, and CommonSpirit Health. We recently raised a
$250M Series D
(~$450M total funding to date) at a
$2.75B valuation , backed by leading investors such as Lightspeed, Redpoint, IVP, Spark Capital, and Elad Gil. As we scale rapidly across enterprise healthcare, we are expanding our Site Reliability Engineering function to ensure our systems remain fast, stable, secure, and capable of supporting hyperscale adoption in a mission‑critical environment. About the Role
Abridge is in hyperscale mode, and we are looking for highly experienced
Senior and Staff SREs
to dramatically improve the performance, stability, and scalability of our systems. This role is approximately
80% software‑focused
and
20% cloud infrastructure‑focused , with deep emphasis on distributed systems, system performance, and engineering velocity. You will introduce load testing and chaos engineering into CI pipelines, develop profiling‑driven performance improvements, move applications onto more scalable infrastructure, and work closely with multiple engineering teams—sometimes embedded for extended periods—to deliver measurable impact. This is a 0→1, high‑autonomy opportunity to shape a next‑generation platform that powers real‑world healthcare workflows at massive scale. What You’ll Do
Use
load testing, chaos engineering, and profiling tools
to identify performance and latency bottlenecks, making direct changes to application code.
Drive software‑level changes to rehome applications onto new infrastructure—new runtimes, databases, event‑driven architectures, and more—to dramatically increase scalability and support multi‑tenant deployments.
Implement software configuration and tuning improvements that meaningfully boost performance and resilience.
Build
developer tools, modules, and enablements
used across the entire engineering organization.
Partner with Platform Engineering to build and roll out new internal developer platform components (service templates, self‑serve infra, etc.).
Help application teams define and adopt
SLOs, error budgets, health metrics , automated canary releases, and improved operational practices.
Strengthen Abridge’s incident‑response readiness by improving observability, runbooks, and organizational response patterns.
Document, train, and evangelize cloud‑native design strategies, tools, and best practices across the engineering org.
Represent Abridge externally as a technical evangelist in the platform engineering community through conferences, OSS contributions, and research.
What We’re Looking For
Experience Requirements
8–10+ years
total engineering experience
3+ years
in DevOps, Infra, Systems, Platform Engineering, or SRE roles
Must Have
Kubernetes & Terraform experience
Production application deployment experience
CI/CD pipeline work
Monitoring experience (Grafana, Datadog)
Hands‑on GCP experience (1+ years)
Ability and willingness to do
IC work
Comfortable participating in an
on‑call rotation
(one week every 6–7 weeks)
Nice to Have
Python, Go, or Node.js experience
Multi‑cloud experience
FedRAMP or PCI compliance exposure
Experience working with or contracting for the VA
Growth‑stage startup experience (combined with Big Tech experience is great)
Other Requirements
Hybrid role: 3+ days/week onsite in
NYC or SF
Must be able to start within
2 weeks of offer
#J-18808-Ljbffr
Abridge has built the most advanced AI platform for clinical conversations, trusted by over 100 major healthcare systems including Kaiser Permanente, Mayo Clinic, and CommonSpirit Health. We recently raised a
$250M Series D
(~$450M total funding to date) at a
$2.75B valuation , backed by leading investors such as Lightspeed, Redpoint, IVP, Spark Capital, and Elad Gil. As we scale rapidly across enterprise healthcare, we are expanding our Site Reliability Engineering function to ensure our systems remain fast, stable, secure, and capable of supporting hyperscale adoption in a mission‑critical environment. About the Role
Abridge is in hyperscale mode, and we are looking for highly experienced
Senior and Staff SREs
to dramatically improve the performance, stability, and scalability of our systems. This role is approximately
80% software‑focused
and
20% cloud infrastructure‑focused , with deep emphasis on distributed systems, system performance, and engineering velocity. You will introduce load testing and chaos engineering into CI pipelines, develop profiling‑driven performance improvements, move applications onto more scalable infrastructure, and work closely with multiple engineering teams—sometimes embedded for extended periods—to deliver measurable impact. This is a 0→1, high‑autonomy opportunity to shape a next‑generation platform that powers real‑world healthcare workflows at massive scale. What You’ll Do
Use
load testing, chaos engineering, and profiling tools
to identify performance and latency bottlenecks, making direct changes to application code.
Drive software‑level changes to rehome applications onto new infrastructure—new runtimes, databases, event‑driven architectures, and more—to dramatically increase scalability and support multi‑tenant deployments.
Implement software configuration and tuning improvements that meaningfully boost performance and resilience.
Build
developer tools, modules, and enablements
used across the entire engineering organization.
Partner with Platform Engineering to build and roll out new internal developer platform components (service templates, self‑serve infra, etc.).
Help application teams define and adopt
SLOs, error budgets, health metrics , automated canary releases, and improved operational practices.
Strengthen Abridge’s incident‑response readiness by improving observability, runbooks, and organizational response patterns.
Document, train, and evangelize cloud‑native design strategies, tools, and best practices across the engineering org.
Represent Abridge externally as a technical evangelist in the platform engineering community through conferences, OSS contributions, and research.
What We’re Looking For
Experience Requirements
8–10+ years
total engineering experience
3+ years
in DevOps, Infra, Systems, Platform Engineering, or SRE roles
Must Have
Kubernetes & Terraform experience
Production application deployment experience
CI/CD pipeline work
Monitoring experience (Grafana, Datadog)
Hands‑on GCP experience (1+ years)
Ability and willingness to do
IC work
Comfortable participating in an
on‑call rotation
(one week every 6–7 weeks)
Nice to Have
Python, Go, or Node.js experience
Multi‑cloud experience
FedRAMP or PCI compliance exposure
Experience working with or contracting for the VA
Growth‑stage startup experience (combined with Big Tech experience is great)
Other Requirements
Hybrid role: 3+ days/week onsite in
NYC or SF
Must be able to start within
2 weeks of offer
#J-18808-Ljbffr