CriticalRiver Inc.
Senior DevOps / Site Reliability Engineer
CriticalRiver Inc., Pleasanton, California, United States, 94566
Senior DevOps / Site Reliability Engineer
Direct message the job poster from CriticalRiver Inc.
About the Role:
We’re looking for an experienced
Senior DevOps / Site Reliability Engineer
to design and build the cloud and reliability foundation for a new multi-tenant SaaS platform, while supporting our existing products. This is a foundational early hire with high impact—you’ll define AWS architecture, establish DevOps and SRE best practices, and ensure 99.9%+ uptime as we scale a multi-tenant platform. You’ll work closely with Platform, Backend, Frontend, and AI teams to enable fast, secure deployments and production-grade reliability.
What You’ll Do:
Architect and manage AWS infrastructure (EKS, RDS, VPC, IAM, S3)
Build and maintain Terraform-based Infrastructure as Code
Own Kubernetes/EKS clusters, scaling, upgrades, and deployments
Design and optimize CI/CD pipelines (GitHub Actions/Jenkins, GitOps)
Implement monitoring, alerting, and observability (Datadog, CloudWatch)
Lead incident response, on‑call processes, and postmortems
Define and track SLOs/SLIs and error budgets
Implement security and compliance controls (SOC 2, IAM, encryption)
Required Qualifications:
7–10+ years of DevOps / SRE experience in production environments
Deep expertise in AWS and Kubernetes (EKS)
Strong experience with Terraform or CloudFormation
Proven ownership of CI/CD, monitoring, and incident management
Experience supporting multi-tenant B2B SaaS platforms
Strong scripting skills (Python or Bash)
Security‑first mindset with hands‑on compliance exposure
Seniority level Mid‑Senior level
Employment type Full‑time
Job function Information Technology
Industries IT Services and IT Consulting
Referrals increase your chances of interviewing at CriticalRiver Inc. by 2x
#J-18808-Ljbffr
About the Role:
We’re looking for an experienced
Senior DevOps / Site Reliability Engineer
to design and build the cloud and reliability foundation for a new multi-tenant SaaS platform, while supporting our existing products. This is a foundational early hire with high impact—you’ll define AWS architecture, establish DevOps and SRE best practices, and ensure 99.9%+ uptime as we scale a multi-tenant platform. You’ll work closely with Platform, Backend, Frontend, and AI teams to enable fast, secure deployments and production-grade reliability.
What You’ll Do:
Architect and manage AWS infrastructure (EKS, RDS, VPC, IAM, S3)
Build and maintain Terraform-based Infrastructure as Code
Own Kubernetes/EKS clusters, scaling, upgrades, and deployments
Design and optimize CI/CD pipelines (GitHub Actions/Jenkins, GitOps)
Implement monitoring, alerting, and observability (Datadog, CloudWatch)
Lead incident response, on‑call processes, and postmortems
Define and track SLOs/SLIs and error budgets
Implement security and compliance controls (SOC 2, IAM, encryption)
Required Qualifications:
7–10+ years of DevOps / SRE experience in production environments
Deep expertise in AWS and Kubernetes (EKS)
Strong experience with Terraform or CloudFormation
Proven ownership of CI/CD, monitoring, and incident management
Experience supporting multi-tenant B2B SaaS platforms
Strong scripting skills (Python or Bash)
Security‑first mindset with hands‑on compliance exposure
Seniority level Mid‑Senior level
Employment type Full‑time
Job function Information Technology
Industries IT Services and IT Consulting
Referrals increase your chances of interviewing at CriticalRiver Inc. by 2x
#J-18808-Ljbffr