Providence Partners, LLC

Sr. Site Reliability Engineer

Providence Partners, LLC, Austin, Texas, us, 78716

Senior Site Reliability Engineer (Sr. SRE)

Location : Hybrid (1-2 days / week)

We are looking for a Senior Site Reliability Engineer (SRE) to help scale and operate highly available, cloud-based systems. In this role, you'll sit at the intersection of software engineering, DevOps, and platform reliability , ensuring our systems are resilient, observable, and built to perform at scale.

You'll lead incident response, drive automation, and partner closely with engineering teams to embed reliability into everything we build.

What You'll Do :

Own the reliability, availability, and performance of production systems

Lead incident response , on-call operations, and blameless post-mortems

Build and improve monitoring, alerting, logging, and observability

Define and manage SLIs, SLOs, and error budgets

Design and build automation and self-service tools to reduce toil

Support cloud infrastructure (AWS, Azure, GCP) using Infrastructure as Code

Improve CI / CD pipelines and deployment reliability

Partner with engineers on system design and architecture

Create runbooks and operational documentation

Mentor team members and promote SRE and DevOps best practices

What We're Looking For :

5+ years of experience in Site Reliability Engineering, DevOps, Platform, or Cloud Engineering

Strong Linux and production troubleshooting skills

Hands-on experience with AWS, Azure, or GCP

Proficiency in Python, Go, Java, Bash, or similar languages

Experience with Terraform, Ansible, or Infrastructure as Code

Experience supporting CI / CD pipelines and production deployments

Strong communication skills and a reliability-first mindset

Nice to Have :

Kubernetes and container orchestration experience

Observability tools like Prometheus, Grafana, Datadog, Splunk, or ELK

Experience with high-traffic, highly available systems

Knowledge of chaos engineering, error budgets, or AIOps

Cloud or Kubernetes certifications

Why Join Us :

Work on scalable, mission-critical platforms

Influence reliability and engineering best practices

Collaborative, blameless culture

Competitive compensation, benefits, and growth opportunities

#J-18808-Ljbffr