CriticalRiver Inc.

Senior DevOps / Site Reliability Engineer

CriticalRiver Inc., Pleasanton, California, United States, 94566

Senior DevOps / Site Reliability Engineer Direct message the job poster from CriticalRiver Inc.

About the Role:

We’re looking for an experienced

Senior DevOps / Site Reliability Engineer

to design and build the cloud and reliability foundation for a new multi-tenant SaaS platform, while supporting our existing products. This is a foundational early hire with high impact—you’ll define AWS architecture, establish DevOps and SRE best practices, and ensure 99.9%+ uptime as we scale a multi-tenant platform. You’ll work closely with Platform, Backend, Frontend, and AI teams to enable fast, secure deployments and production-grade reliability.

What You’ll Do:

Architect and manage AWS infrastructure (EKS, RDS, VPC, IAM, S3)

Build and maintain Terraform-based Infrastructure as Code

Own Kubernetes/EKS clusters, scaling, upgrades, and deployments

Design and optimize CI/CD pipelines (GitHub Actions/Jenkins, GitOps)

Implement monitoring, alerting, and observability (Datadog, CloudWatch)

Lead incident response, on‑call processes, and postmortems

Define and track SLOs/SLIs and error budgets

Implement security and compliance controls (SOC 2, IAM, encryption)

Required Qualifications:

7–10+ years of DevOps / SRE experience in production environments

Deep expertise in AWS and Kubernetes (EKS)

Strong experience with Terraform or CloudFormation

Proven ownership of CI/CD, monitoring, and incident management

Experience supporting multi-tenant B2B SaaS platforms

Strong scripting skills (Python or Bash)

Security‑first mindset with hands‑on compliance exposure

Seniority level Mid‑Senior level

Employment type Full‑time

Job function Information Technology

Industries IT Services and IT Consulting

Referrals increase your chances of interviewing at CriticalRiver Inc. by 2x

#J-18808-Ljbffr