Logo
CriticalRiver Inc.

Senior DevOps / Site Reliability Engineer (Pleasanton)

CriticalRiver Inc., Pleasanton, California, United States, 94566

Save Job

About the Role: Were looking for an experienced

Senior DevOps / Site Reliability Engineer

to design and build the cloud and reliability foundation for a new multi-tenant SaaS platform, while supporting our existing products. This is a foundational early hire with high impactyoull define AWS architecture, establish DevOps and SRE best practices, and ensure 99.9%+ uptime as we scale a multi-tenant platform. Youll work closely with Platform, Backend, Frontend, and AI teams to enable fast, secure deployments and production-grade reliability.

What Youll Do: Architect and manage AWS infrastructure (EKS, RDS, VPC, IAM, S3) Build and maintain Terraform-based Infrastructure as Code Own Kubernetes/EKS clusters, scaling, upgrades, and deployments Design and optimize CI/CD pipelines (GitHub Actions/Jenkins, GitOps) Implement monitoring, alerting, and observability (Datadog, CloudWatch) Lead incident response, on-call processes, and postmortems Define and track SLOs/SLIs and error budgets Implement security and compliance controls (SOC 2, IAM, encryption)

Required Qualifications: 710+ years of DevOps / SRE experience in production environments Deep expertise in AWS and Kubernetes (EKS) Strong experience with Terraform or CloudFormation Proven ownership of CI/CD, monitoring, and incident management Experience supporting multi-tenant B2B SaaS platforms Strong scripting skills (Python or Bash) Security-first mindset with hands-on compliance exposure