Broad Reach Partners
Base pay range
$140,000.00/yr - $150,000.00/yr Are you a passionate Site Reliability Engineer with a keen eye for performance, scalability, and automation? Do you thrive in fast-paced environments, ensuring seamless production systems that meet high availability and reliability standards? If so, we want to hear from you! We are looking for a
skilled Site Reliability Engineer (SRE)
to join our dynamic team. In this role, you will play a crucial part in enhancing the stability, performance, and reliability of our production systems. You’ll work closely with development, DevOps, and security teams to improve observability, optimize system performance, and ensure production readiness. From monitoring to automation, you’ll make a direct impact on our cloud infrastructure and service reliability. NOTE: This role is a hybrid role and will be working from our offices in Alpharetta, GA a few days each week. YOU MUST LIVE IN THE ATLANTA AREA TO BE CONSIDERED FOR THIS ROLE! What we’re looking for
Monitoring & Observability
Maintain and enhance system observability using tools like
New Relic
and
Graylog
(or similar). Develop actionable alerts and dashboards to track service health and performance metrics.
Reliability Engineering
Implement and maintain reliable, scalable systems, focusing on
capacity planning ,
performance optimization , and
fault tolerance . Collaborate on defining and monitoring
SLIs ,
SLOs , and
SLAs
for service reliability.
Automation & Infrastructure Operations
Automate operational tasks, minimizing manual interventions. Manage
Kubernetes
workloads on
AWS EKS , ensuring security and stability. Leverage
HashiCorp Vault
to manage secrets and ensure compliance.
Incident & Problem Management
Participate in
on-call rotation
and resolve production incidents swiftly. Troubleshoot production issues, perform root cause analysis, and implement permanent fixes. Lead post-incident reviews and follow through on remediation actions. Work with DevOps teams to enhance
CI/CD
pipelines for production readiness. Partner with development teams to embed
resilience
and
observability
into applications.
Documentation & Knowledge Sharing
Create and maintain operational runbooks, escalation procedures, and production playbooks.
What we’re looking for
6+ years
of experience in SRE,
DevOps , or a similar role. Expertise with
AWS
(EKS, EC2, S3, Route53, IAM). 6+ years
managing
production Kubernetes workloads . Hands-on experience with monitoring tools like
New Relic
or
Graylog . Experience with
HashiCorp Vault
(or similar secrets management tools). Proficiency in
automation
and
CI/CD
using
GitHub Actions ,
GitLab ,
Helm , and
ArgoCD . Expertise in
Infrastructure as Code (IaC) , specifically with
Terraform . Strong scripting skills in
Python ,
Bash , or similar languages. Solid troubleshooting, debugging, and
root cause analysis
skills. A willingness to participate in a 24/7
on-call rotation . Bonus Skills (Nice to Have)
AWS Certification Familiarity with the
.NET application stack Exposure to
multi-cloud
environments Experience with
Rancher
for managing Kubernetes clusters on-prem Familiarity with
Packer
for building
Golden AMIs Why Join Us?
Impactful Work : You’ll be directly involved in shaping the reliability and scalability of our production systems. Hybrid Flexibility : Work from our
Alpharetta office
3 days a week and enjoy flexibility on the other days. Collaborative Culture : Join a passionate team of engineers and problem-solvers who believe in innovation and continuous improvement. Growth & Development : We support your personal and professional growth with opportunities for learning and certifications. Competitive Compensation : We offer an attractive salary, benefits, and perks. About Us
We’re a growing company in a unique industry and committed to delivering reliable, high-performance solutions for our customers. If you’re a detail-oriented, proactive problem-solver with a passion for reliability and automation, you’ll thrive here. Ready to take the next step in your SRE career? Apply now and help us build the future of reliable systems! Location
Alpharetta, GA (hybrid)
#J-18808-Ljbffr
$140,000.00/yr - $150,000.00/yr Are you a passionate Site Reliability Engineer with a keen eye for performance, scalability, and automation? Do you thrive in fast-paced environments, ensuring seamless production systems that meet high availability and reliability standards? If so, we want to hear from you! We are looking for a
skilled Site Reliability Engineer (SRE)
to join our dynamic team. In this role, you will play a crucial part in enhancing the stability, performance, and reliability of our production systems. You’ll work closely with development, DevOps, and security teams to improve observability, optimize system performance, and ensure production readiness. From monitoring to automation, you’ll make a direct impact on our cloud infrastructure and service reliability. NOTE: This role is a hybrid role and will be working from our offices in Alpharetta, GA a few days each week. YOU MUST LIVE IN THE ATLANTA AREA TO BE CONSIDERED FOR THIS ROLE! What we’re looking for
Monitoring & Observability
Maintain and enhance system observability using tools like
New Relic
and
Graylog
(or similar). Develop actionable alerts and dashboards to track service health and performance metrics.
Reliability Engineering
Implement and maintain reliable, scalable systems, focusing on
capacity planning ,
performance optimization , and
fault tolerance . Collaborate on defining and monitoring
SLIs ,
SLOs , and
SLAs
for service reliability.
Automation & Infrastructure Operations
Automate operational tasks, minimizing manual interventions. Manage
Kubernetes
workloads on
AWS EKS , ensuring security and stability. Leverage
HashiCorp Vault
to manage secrets and ensure compliance.
Incident & Problem Management
Participate in
on-call rotation
and resolve production incidents swiftly. Troubleshoot production issues, perform root cause analysis, and implement permanent fixes. Lead post-incident reviews and follow through on remediation actions. Work with DevOps teams to enhance
CI/CD
pipelines for production readiness. Partner with development teams to embed
resilience
and
observability
into applications.
Documentation & Knowledge Sharing
Create and maintain operational runbooks, escalation procedures, and production playbooks.
What we’re looking for
6+ years
of experience in SRE,
DevOps , or a similar role. Expertise with
AWS
(EKS, EC2, S3, Route53, IAM). 6+ years
managing
production Kubernetes workloads . Hands-on experience with monitoring tools like
New Relic
or
Graylog . Experience with
HashiCorp Vault
(or similar secrets management tools). Proficiency in
automation
and
CI/CD
using
GitHub Actions ,
GitLab ,
Helm , and
ArgoCD . Expertise in
Infrastructure as Code (IaC) , specifically with
Terraform . Strong scripting skills in
Python ,
Bash , or similar languages. Solid troubleshooting, debugging, and
root cause analysis
skills. A willingness to participate in a 24/7
on-call rotation . Bonus Skills (Nice to Have)
AWS Certification Familiarity with the
.NET application stack Exposure to
multi-cloud
environments Experience with
Rancher
for managing Kubernetes clusters on-prem Familiarity with
Packer
for building
Golden AMIs Why Join Us?
Impactful Work : You’ll be directly involved in shaping the reliability and scalability of our production systems. Hybrid Flexibility : Work from our
Alpharetta office
3 days a week and enjoy flexibility on the other days. Collaborative Culture : Join a passionate team of engineers and problem-solvers who believe in innovation and continuous improvement. Growth & Development : We support your personal and professional growth with opportunities for learning and certifications. Competitive Compensation : We offer an attractive salary, benefits, and perks. About Us
We’re a growing company in a unique industry and committed to delivering reliable, high-performance solutions for our customers. If you’re a detail-oriented, proactive problem-solver with a passion for reliability and automation, you’ll thrive here. Ready to take the next step in your SRE career? Apply now and help us build the future of reliable systems! Location
Alpharetta, GA (hybrid)
#J-18808-Ljbffr