Gradle Inc.

Senior Site Reliability Engineer

Gradle Inc., San Francisco, California, United States, 94199

Senior Site Reliability Engineer Gradle Inc.

About the Role Join Gradle Inc. as a Senior Site Reliability Engineer overseeing the reliability, performance, and availability of Develocity instances serving paying customers, open‑source projects, and public‑facing services, along with supporting infrastructure such as artifact registries.

Company Overview Develocity is a first‑of‑its‑kind toolchain observability and acceleration platform that helps software teams improve DORA capabilities across Gradle, Maven, sbt, npm, and Python. It supports both CI and local builds, accelerating delivery and deepening observability.

Core Values

Seek to Understand

Know the Why

Innovate & Iterate

Own the Outcome

What You'll Do

Operate and maintain all Develocity instances and supporting services.

Participate in a follow‑the‑sun on‑call rotation, owning incident response and troubleshooting across the stack.

Drive automation across deployment, upgrades, monitoring, self‑healing, and recovery.

Build and maintain observability (logging, metrics, tracing, alerting) for all managed services.

Collaborate with engineering teams to embed reliability into features from the start.

Run incident response and retrospectives, learning from them.

Own disaster recovery, backups, and business continuity.

Communicate with customers during incidents and maintenance windows.

Optimize performance, resource usage, and cost.

Help evolve our SaaS operations as we scale.

Minimum Qualifications

5+ years in SRE, DevOps, or equivalent role operating production services at scale.

Strong Kubernetes experience in production environments.

Cloud infrastructure expertise, preferably AWS (EKS, RDS, S3, EC2).

Proficiency with observability tools (Prometheus, Grafana) and IaC (Terraform).

Track record of incident management and response.

Knowledge of SRE best practices (SLAs, SLOs).

Proficient scripting (Python, Bash) for automation.

Experience with 24/7 on‑call rotations.

Strong written and verbal English communication.

Preferred Qualifications

Experience operating SaaS platforms at scale.

Familiarity with Develocity.

JVM language experience (Java, Kotlin).

Disaster recovery planning and execution.

Customer‑facing incident communication skills.

Experience establishing SRE practices in new or growing teams.

What We Offer

Ground‑floor role in a new SRE team with real ownership of production systems.

Direct interaction with customers when issues arise.

A culture that values automation over heroics.

In‑person meetings such as annual company offsite and team gatherings.

Remote‑first environment with work‑from‑home flexibility.

Competitive salary and equity grants.

Compensation US salary range: $150,000 – $190,000. Pay is determined by location, experience, skills, seniority, performance, and travel requirements.

Location Remote from anywhere in the PST timezone.

Seniority Level Mid‑Senior

Employment Type Full‑time

Job Function Architecture and Planning

#J-18808-Ljbffr