Highbrow LLC

Site Reliability Engineer

Highbrow LLC, Bellevue, Washington, us, 98009

Overview

We’re seeking a skilled

Site Reliability Engineer

(SRE) to join our growing engineering team at

Highbrow LLC . You will be responsible for building, maintaining, and scaling production systems while improving reliability, availability, and performance. You’ll work at the intersection of software engineering and infrastructure, automating deployment, monitoring, and incident response. Location and Position

Location: Bellevue, WA Positions: 1 Responsibilities

Design, implement, and maintain scalable and reliable infrastructure using automation tools. Develop and manage monitoring, alerting, and incident response systems to ensure high availability and performance of services. Collaborate with development teams to ensure production readiness and enforce best practices for CI/CD, observability, and fault tolerance. Troubleshoot and resolve production issues, conduct root cause analysis, and implement postmortem processes. Continuously improve deployment pipelines, configuration management, and system orchestration tools. Manage cloud infrastructure (e.g., AWS, GCP, Azure), Kubernetes clusters, and containerized applications. Define and enforce SLOs/SLIs/SLAs and proactively maintain service health and uptime. Participate in an on-call rotation and minimize pager fatigue through proactive system improvements. Support security, compliance, and audit readiness efforts through automation and monitoring. Required Qualifications

3–7 years of experience in SRE, DevOps, or backend infrastructure roles. Strong understanding of Linux systems administration, networking, and performance tuning. Proficiency in scripting and automation using Python, Go, Bash, or similar. Experience with CI/CD pipelines (e.g., GitLab CI, Jenkins, ArgoCD). Expertise in monitoring and observability tools (e.g., Prometheus, Grafana, ELK/EFK, Datadog). Hands-on experience with cloud providers like AWS, GCP, or Azure. Strong knowledge of Kubernetes, Docker, and container orchestration best practices. Familiarity with infrastructure as code (IaC) using Terraform, Pulumi, or CloudFormation. Excellent communication and collaboration skills; ability to work cross-functionally. Preferred Qualifications

Experience in high-scale, high-availability environments. Background in incident management, chaos engineering, or resilience testing. Familiarity with service mesh technologies (e.g., Istio, Linkerd). Experience working in regulated industries (e.g., fintech, healthcare, telecom). Contributions to open-source SRE, DevOps, or cloud-native projects. Application

To apply for this job, email your details to

jobs@highbrow-tech.com .

#J-18808-Ljbffr