Logo
Branch Metrics

Staff Site Reliability Engineer

Branch Metrics, Denver, Colorado, United States, 80285

Save Job

At Branch, we’re transforming how brands and users interact across digital platforms. Our mobile marketing and deep linking solutions are trusted to deliver seamless experiences that increase ROI, decrease wasted spend, and eliminate siloed attribution. Our Branch team consists of smart, humble, and collaborative people who value ownership over all. Everything we do is centered around creating a great product, team, and company that lives and breathes our motto: Build Together, Grow Together, Win Together.

As a Staff Site Reliability Engineer at Branch, you will lead our engineering‑wide reliability program, shaping how teams build, measure, and operate reliable systems at scale. You’ll partner across product and platform teams to establish standards for service ownership, observability, and SLOs, and ensure reliability is embedded throughout the engineering lifecycle.

In this role, you’ll act as both a hands‑on technical leader and a mentor, introducing best practices, creating frameworks, and driving adoption of SRE principles across the organization. You’ll have visibility at the executive level, reporting progress and impact to the VP of Engineering and other stakeholders.

Responsibilities

Lead the design and execution of an engineering‑wide reliability program, ensuring teams adopt SRE principles and best practices.

Define and champion service ownership standards, partnering with product and platform teams to embed reliability into the development lifecycle.

Establish and evolve observability practices (metrics, logs, traces), ensuring teams have the tooling and insights to detect, debug, and prevent incidents.

Partner with engineering leaders to define SLIs, SLOs, and error budgets, and ensure they are actionable and tied to business outcomes.

Collaborate with teams to design systems for resilience, scalability, and fault tolerance, reducing operational risk.

Provide mentorship and guidance to engineers across the organization, helping them improve their operational skills and reliability mindset.

Identify opportunities to add automation that increases developer productivity and reduces toil.

Create standards, frameworks, and runbooks that scale reliability practices across multiple product lines and teams.

Participate in and improve incident response practices (on‑call strategy, SEVs, post‑mortems, blameless culture).

Report on progress, trends, and impact of the reliability program to leaders and stakeholders.

Core Qualifications

7+ years of experience in Site Reliability Engineering, Systems Engineering, or related fields, with at least 2–3 years in a senior/staff‑level role.

Strong software engineering skills in one or more languages (e.g., Python, Go, Java).

Expertise with cloud infrastructure (AWS preferred) and distributed systems at scale.

Deep understanding of observability practices (metrics, logs, tracing) and hands‑on experience with tools like Datadog, Prometheus, Grafana, or equivalent.

Strong background in adding automation in key areas to increase developer productivity and reduce toil.

Proven experience defining and rolling out SLIs, SLOs, and error budgets across engineering teams.

Strong background in incident response, post‑mortems, and on‑call operations, with a bias toward automation and reducing toil.

Demonstrated ability to influence and mentor engineers across multiple teams, driving adoption of SRE and reliability best practices.

Excellent communication skills, with the ability to convey technical concepts and reliability trade‑offs to engineers, leadership, and stakeholders.

Nice to Have

Experience with Kubernetes and container orchestration.

Familiarity with infrastructure‑as‑code tools (Terraform, CloudFormation, or similar).

Knowledge of CI/CD systems and modern release engineering practices.

Prior experience building or leading an organization‑wide reliability program.

Familiarity with security and compliance considerations for large‑scale platforms.

This role is 100% remote in Colorado. This role is not eligible for remote work in any other location.

In accordance with applicable law, the following represents a reasonable estimated compensation range for this role: the estimated pay range for this role, if based in Colorado, is $169,000 – $215,000.

Branch is an equal opportunity employer. All applicants will be considered for employment without attention to race, color, religion, sex, sexual orientation, gender identity, national origin, veteran or disability status.

At Branch, we strive to create an inclusive culture that encourages people from all walks of life to bring their unique, diverse perspectives to work. We aim every day to build an environment that empowers us all to do the best work of our careers, and we can’t wait to show you what we have to offer!

#J-18808-Ljbffr