Logo
Broad Reach Partners

Site Reliability Engineer

Broad Reach Partners, Alpharetta, Georgia, United States, 30239

Save Job

Base pay range

$140,000.00/yr - $150,000.00/yr Are you a passionate Site Reliability Engineer with a keen eye for performance, scalability, and automation? Do you thrive in fast-paced environments, ensuring seamless production systems that meet high availability and reliability standards? If so, we want to hear from you! We are looking for a

skilled Site Reliability Engineer (SRE)

to join our dynamic team. In this role, you will play a crucial part in enhancing the stability, performance, and reliability of our production systems. You’ll work closely with development, DevOps, and security teams to improve observability, optimize system performance, and ensure production readiness. From monitoring to automation, you’ll make a direct impact on our cloud infrastructure and service reliability. NOTE: This role is a hybrid role and will be working from our offices in Alpharetta, GA a few days each week. YOU MUST LIVE IN THE ATLANTA AREA TO BE CONSIDERED FOR THIS ROLE! What we’re looking for

Monitoring & Observability

Maintain and enhance system observability using tools like

New Relic

and

Graylog

(or similar). Develop actionable alerts and dashboards to track service health and performance metrics.

Reliability Engineering

Implement and maintain reliable, scalable systems, focusing on

capacity planning ,

performance optimization , and

fault tolerance . Collaborate on defining and monitoring

SLIs ,

SLOs , and

SLAs

for service reliability.

Automation & Infrastructure Operations

Automate operational tasks, minimizing manual interventions. Manage

Kubernetes

workloads on

AWS EKS , ensuring security and stability. Leverage

HashiCorp Vault

to manage secrets and ensure compliance.

Incident & Problem Management

Participate in

on-call rotation

and resolve production incidents swiftly. Troubleshoot production issues, perform root cause analysis, and implement permanent fixes. Lead post-incident reviews and follow through on remediation actions. Work with DevOps teams to enhance

CI/CD

pipelines for production readiness. Partner with development teams to embed

resilience

and

observability

into applications.

Documentation & Knowledge Sharing

Create and maintain operational runbooks, escalation procedures, and production playbooks.

What we’re looking for

6+ years

of experience in SRE,

DevOps , or a similar role. Expertise with

AWS

(EKS, EC2, S3, Route53, IAM). 6+ years

managing

production Kubernetes workloads . Hands-on experience with monitoring tools like

New Relic

or

Graylog . Experience with

HashiCorp Vault

(or similar secrets management tools). Proficiency in

automation

and

CI/CD

using

GitHub Actions ,

GitLab ,

Helm , and

ArgoCD . Expertise in

Infrastructure as Code (IaC) , specifically with

Terraform . Strong scripting skills in

Python ,

Bash , or similar languages. Solid troubleshooting, debugging, and

root cause analysis

skills. A willingness to participate in a 24/7

on-call rotation . Bonus Skills (Nice to Have)

AWS Certification Familiarity with the

.NET application stack Exposure to

multi-cloud

environments Experience with

Rancher

for managing Kubernetes clusters on-prem Familiarity with

Packer

for building

Golden AMIs Why Join Us?

Impactful Work : You’ll be directly involved in shaping the reliability and scalability of our production systems. Hybrid Flexibility : Work from our

Alpharetta office

3 days a week and enjoy flexibility on the other days. Collaborative Culture : Join a passionate team of engineers and problem-solvers who believe in innovation and continuous improvement. Growth & Development : We support your personal and professional growth with opportunities for learning and certifications. Competitive Compensation : We offer an attractive salary, benefits, and perks. About Us

We’re a growing company in a unique industry and committed to delivering reliable, high-performance solutions for our customers. If you’re a detail-oriented, proactive problem-solver with a passion for reliability and automation, you’ll thrive here. Ready to take the next step in your SRE career? Apply now and help us build the future of reliable systems! Location

Alpharetta, GA (hybrid)

#J-18808-Ljbffr