Logo
Ciitizen

Senior Site Reliability Engineer

Ciitizen, San Mateo, California, United States, 94409

Save Job

Join to apply for the

Senior Site Reliability Engineer

role at

Ciitizen Join to apply for the

Senior Site Reliability Engineer

role at

Ciitizen Who We Are

At Citizen Health, we have a singular mission: to improve the lives of the 350 million+ people suffering from rare and complex conditions worldwide. We empower patients by providing seamless access and control over their health data — enabling them to share it safely across a multi-sided platform with caregivers, providers, and researchers. This transparency accelerates the discovery of better treatments, faster therapy development, and ultimately improves patient outcomes. Who We Are

At Citizen Health, we have a singular mission: to improve the lives of the 350 million+ people suffering from rare and complex conditions worldwide. We empower patients by providing seamless access and control over their health data — enabling them to share it safely across a multi-sided platform with caregivers, providers, and researchers. This transparency accelerates the discovery of better treatments, faster therapy development, and ultimately improves patient outcomes.

We’re more than just a company — we’re a passionate community of patients, caregivers, researchers, and builders united by lived experience and a shared commitment to transforming rare disease care. Led by a seasoned founding team with deep healthcare and consumer startup expertise, and backed by top-tier investors, we are a close-knit, mission-driven team. We foster a culture where innovation, empathy, and impact thrive.

Why Join Us?

At Citizen Health, Your Work Will Directly Improve The Lives Of People Who Need It Most. We Value Curiosity, Creativity, And Collaboration. Here, You’ll Find

A culture that celebrates empathy and diversity, recognizing that our strength lies in our varied perspectives. The autonomy to own your projects and influence company direction. A fast-paced, mission-driven environment where your contributions make a tangible difference every day. A community that supports continuous learning and growth.

The Role

The Role

Citizen Health is seeking a Senior Site Reliability Engineer (SRE) to ensure the resilience, performance, and availability of our AI-powered, patient-centric healthcare platform.

In this hands-on, high-impact role, you will apply software engineering principles to operational challenges—designing and maintaining reliable systems that scale, fail gracefully, and recover quickly. You'll work cross-functionally to establish SLOs/SLIs, implement robust observability, establish a metrics-driven approach to service performance, and drive improvements in incident response, fault tolerance, and service reliability.

If you're passionate about building systems that stay up, scale well, and recover fast—and you thrive on solving reliability challenges in modern cloud-native environments—we’d love to talk to you.

Responsibilities

Reliability Engineering & Observability

Define and measure service reliability through SLIs, SLOs, and error budgets. Implement and operate observability tooling (e.g., NewRelic, Prometheus …) across cloud and Kubernetes environments. Analyze logs, traces, and metrics to surface actionable insights and improve system health. Perform capacity planning, load testing, and performance profiling and tuning to support scale and reliability, and to optimize system performance.

Resilience & Automation

Design and maintain resilient, self-healing infrastructure in AWS and Kubernetes (EKS). Conduct chaos engineering experiments, failure mode analysis, and disaster recovery drills to proactively identify and fix weaknesses. Build automation to reduce toil, improve reliability metrics (latency, uptime, error rates, MTTD, and MTTR), and prevent recurrence of incidents. Engineer infrastructure for fault tolerance, auto-scaling, and graceful degradation.

Incident Response & Operations

Drive incident response efforts, manage on-call rotations, and coordinate resolution of production outages. Conduct root cause analyses and blameless postmortems to drive learning and resilience. Continuously improve key reliability metrics such as latency, uptime, error rates, and availability. Collaborate with security, platform, and DevOps teams to ensure high-availability and production-readiness of services.

Cross-Team Collaboration & Culture

Collaborate with engineering teams during design and architecture phases to assess and mitigate reliability risks. Support progressive delivery strategies including feature flags and canary deployments. Champion SRE principles and practices, helping build a culture of resilience and shared ownership. Stay current with emerging practices in cloud reliability, observability, and SRE tooling.

Who You Are

You are a hands-on

Senior Site Reliability Engineer

who thrives in fast-paced, high-stakes environments where reliability is mission-critical. You bring deep experience operating distributed systems at scale, with a strong foundation in cloud-native infrastructure, Kubernetes, and observability.

You think like a software engineer but focus like an operator—using code to solve operational challenges. You’re driven by making systems more resilient, reducing downtime, and building fault-tolerant architectures that scale with user demand. You value data-driven decision making, and see SLIs/SLOs, incident postmortems, and continuous improvement as essential tools—not checkboxes.

You have a strong sense of ownership, thrive under pressure, and believe the best systems are the ones that heal themselves. You collaborate closely across teams, care deeply about the end-user experience, and are always looking for better ways to keep complex systems running smoothly.

Must-Have Skills

5+ years in Site Reliability, DevOps, or Infrastructure Engineering roles. Strong software engineering skills in languages such as Python, Go, or Bash. Deep expertise operating production systems in AWS, with additional experience in GCP or Azure. Proven experience operating and scaling Kubernetes (EKS) in production. Experience implementing GitOps with FluxCD (or similar). Hands-on experience implementing observability, auto-scaling, and self-healing systems. Strong foundation in networking, load balancing, CDN, and container orchestration. Solid knowledge of performance optimization techniques (e.g. profiling, caching, tuning) Strong incident response background, including postmortems and SLO/SLI development, driving improvements through data and analysis. Passion for reliability, automation, operational excellence, and building systems patients and clinicians can trust. Excellent communication and collaboration skills A methodical approach to problem-solving and system design

Preferred Skills

Deep understanding of distributed systems, fault tolerance, and failure modes. Experience implementing chaos engineering practices (Gremlin, Chaos Mesh, etc.) Familiarity with multi-region, multi-cloud reliability strategies Experience with service meshes (e.g., Istio) and resilience patterns Solid background in security, compliance, and operational hardening (HIPAA, SOC 2) Experience in capacity planning, scaling, and disaster recovery design

Benefits & Culture

Ownership & Impact

You’ll have a seat at the table — your ideas will shape our products and culture. Work on mission-critical projects that directly improve patient lives every day.

Growth & Development

Rapid company growth means expanding opportunities and career progression. Supportive environment for learning, experimentation, and innovation.

Culture & Community

Transparent, inclusive, and genuinely fun workplace — we believe great work happens when you feel valued and inspired. Regular team activities, knowledge sharing, and a culture that prioritizes well-being.

Additional Perks

Competitive salary + equity package. Comprehensive health, dental, and vision insurance. Unlimited paid time off and flexible hybrid work environment.

Don’t meet every qualification?

No worries! We believe that passion, curiosity, and the right mindset are just as important as a checklist of skills. If you’re excited about what we’re building, we encourage you to apply.

Our Commitment to Diversity & Inclusion

Citizen Health is proud to be an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. We welcome applicants of all backgrounds, identities, and experiences. Everyone deserves an equal chance to contribute and grow here — regardless of race, gender identity, sexual orientation, religion, national origin, age, disability, or veteran status. Seniority level

Seniority level Mid-Senior level Employment type

Employment type Full-time Job function

Job function Engineering and Information Technology Industries Technology, Information and Internet Referrals increase your chances of interviewing at Ciitizen by 2x Sign in to set job alerts for “Senior Site Reliability Engineer” roles.

Senior Site Reliability Engineer, Supply

Senior Site Reliability Engineer - DGX Cloud

Sr Principal Engineer Software (AIOps for NGFW)

Senior Site Reliability Engineer - Networking

San Francisco, CA $255,000.00-$405,000.00 1 day ago Sr. Software Engineer, HIL Automation, Autonomy

Sr. Software Engineer, Plant Modeling and Tools

Mountain View, CA $204,000.00-$259,000.00 5 hours ago San Francisco, CA $204,000.00-$259,000.00 5 hours ago Mountain View, CA $180,000.00-$240,000.00 2 days ago Sr. Software Engineer, Supply Chain Applications

Senior Robotics Software Engineer, Planning and Control

San Francisco, CA $150,000.00-$240,000.00 3 weeks ago San Francisco, CA $70.00-$80.00 2 weeks ago Staff / Tech Lead Engineer - Driver Pricing Platform

San Francisco, CA $223,000.00-$248,000.00 2 weeks ago Sr. Software Engineer - Payments Support

San Francisco, CA $138,400.00-$173,000.00 2 weeks ago Senior Software Engineer, ASIC Verification Tools

Fremont, CA $132,000.00-$276,000.00 1 day ago Senior Hardware Modeling Simulation SDE, AWS Machine Learning Accelerators

Cupertino, CA $151,300.00-$261,500.00 2 weeks ago Senior Software Engineer, Fabric Networking - GPU

Senior Software Engineer, Audio-Video Processing (Req ID: 2025-15)

Senior Software Engineer, GenAI Model Quality

Mountain View, CA $180,000.00-$220,000.00 2 weeks ago Founding Team Senior Software Engineer Manager for a Social Travel Platform targeting Digital Nomads

San Francisco, CA $150,000.00-$200,000.00 4 months ago San Francisco, CA $150,000.00-$175,000.00 4 weeks ago Sunnyvale, CA $151,300.00-$261,500.00 2 weeks ago Redwood City, CA $140,000.00-$198,000.00 2 days ago Sr. Staff Software Development Engineer - RCCL, GPU Communication Libraries, C++

San Francisco, CA $170,000.00-$190,000.00 1 month ago Santa Clara, CA $168,000.00-$322,000.00 4 days ago We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

#J-18808-Ljbffr