Highbrow LLC
Overview
We’re seeking a skilled
Site Reliability Engineer
(SRE) to join our growing engineering team at
Highbrow LLC . You will be responsible for building, maintaining, and scaling production systems while improving reliability, availability, and performance. You’ll work at the intersection of software engineering and infrastructure, automating deployment, monitoring, and incident response. Location and Position
Location: Bellevue, WA Positions: 1 Responsibilities
Design, implement, and maintain scalable and reliable infrastructure using automation tools. Develop and manage monitoring, alerting, and incident response systems to ensure high availability and performance of services. Collaborate with development teams to ensure production readiness and enforce best practices for CI/CD, observability, and fault tolerance. Troubleshoot and resolve production issues, conduct root cause analysis, and implement postmortem processes. Continuously improve deployment pipelines, configuration management, and system orchestration tools. Manage cloud infrastructure (e.g., AWS, GCP, Azure), Kubernetes clusters, and containerized applications. Define and enforce SLOs/SLIs/SLAs and proactively maintain service health and uptime. Participate in an on-call rotation and minimize pager fatigue through proactive system improvements. Support security, compliance, and audit readiness efforts through automation and monitoring. Required Qualifications
3–7 years of experience in SRE, DevOps, or backend infrastructure roles. Strong understanding of Linux systems administration, networking, and performance tuning. Proficiency in scripting and automation using Python, Go, Bash, or similar. Experience with CI/CD pipelines (e.g., GitLab CI, Jenkins, ArgoCD). Expertise in monitoring and observability tools (e.g., Prometheus, Grafana, ELK/EFK, Datadog). Hands-on experience with cloud providers like AWS, GCP, or Azure. Strong knowledge of Kubernetes, Docker, and container orchestration best practices. Familiarity with infrastructure as code (IaC) using Terraform, Pulumi, or CloudFormation. Excellent communication and collaboration skills; ability to work cross-functionally. Preferred Qualifications
Experience in high-scale, high-availability environments. Background in incident management, chaos engineering, or resilience testing. Familiarity with service mesh technologies (e.g., Istio, Linkerd). Experience working in regulated industries (e.g., fintech, healthcare, telecom). Contributions to open-source SRE, DevOps, or cloud-native projects. Application
To apply for this job, email your details to
jobs@highbrow-tech.com .
#J-18808-Ljbffr
We’re seeking a skilled
Site Reliability Engineer
(SRE) to join our growing engineering team at
Highbrow LLC . You will be responsible for building, maintaining, and scaling production systems while improving reliability, availability, and performance. You’ll work at the intersection of software engineering and infrastructure, automating deployment, monitoring, and incident response. Location and Position
Location: Bellevue, WA Positions: 1 Responsibilities
Design, implement, and maintain scalable and reliable infrastructure using automation tools. Develop and manage monitoring, alerting, and incident response systems to ensure high availability and performance of services. Collaborate with development teams to ensure production readiness and enforce best practices for CI/CD, observability, and fault tolerance. Troubleshoot and resolve production issues, conduct root cause analysis, and implement postmortem processes. Continuously improve deployment pipelines, configuration management, and system orchestration tools. Manage cloud infrastructure (e.g., AWS, GCP, Azure), Kubernetes clusters, and containerized applications. Define and enforce SLOs/SLIs/SLAs and proactively maintain service health and uptime. Participate in an on-call rotation and minimize pager fatigue through proactive system improvements. Support security, compliance, and audit readiness efforts through automation and monitoring. Required Qualifications
3–7 years of experience in SRE, DevOps, or backend infrastructure roles. Strong understanding of Linux systems administration, networking, and performance tuning. Proficiency in scripting and automation using Python, Go, Bash, or similar. Experience with CI/CD pipelines (e.g., GitLab CI, Jenkins, ArgoCD). Expertise in monitoring and observability tools (e.g., Prometheus, Grafana, ELK/EFK, Datadog). Hands-on experience with cloud providers like AWS, GCP, or Azure. Strong knowledge of Kubernetes, Docker, and container orchestration best practices. Familiarity with infrastructure as code (IaC) using Terraform, Pulumi, or CloudFormation. Excellent communication and collaboration skills; ability to work cross-functionally. Preferred Qualifications
Experience in high-scale, high-availability environments. Background in incident management, chaos engineering, or resilience testing. Familiarity with service mesh technologies (e.g., Istio, Linkerd). Experience working in regulated industries (e.g., fintech, healthcare, telecom). Contributions to open-source SRE, DevOps, or cloud-native projects. Application
To apply for this job, email your details to
jobs@highbrow-tech.com .
#J-18808-Ljbffr