Zencon Group

Site Reliability Engineer (SRE)

Zencon Group, Atlanta

Job Title: SRE
Location: Atlanta GA (Hybrid model)
Job Description:
Lambda, Kubernetes, Docker, Sumo Logic, Devops, Gitlab CI, Python
Step function in AWS
Job Summary:
We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our dynamic DevOps team in Atlanta, GA. This hybrid role is ideal for a proactive engineer who thrives in a fast-paced environment, has deep expertise in automation, observability, and CI/CD, and is comfortable working across infrastructure and development teams.
As an SRE, you will be responsible for improving the reliability, availability, and performance of our cloud-native systems. Your day-to-day will involve working with AWS Lambda, Kubernetes, Docker, GitLab CI, Python, and Sumo Logic, among other tools. You will also develop and maintain step functions and serverless components in AWS.
Key Responsibilities:

Design, implement, and maintain scalable, reliable infrastructure using AWS Lambda, Step Functions , and other serverless technologies.
Deploy, manage, and monitor containers using Kubernetes and Docker .
Build and maintain CI/CD pipelines with GitLab CI .
Develop automation scripts and internal tooling using Python .
Enhance system observability through integration and management of Sumo Logic for monitoring and log aggregation.
Collaborate with development, QA, and operations teams to drive site reliability best practices across the SDLC.
Identify performance bottlenecks and lead incident response, root cause analysis, and postmortem activities.
Participate in an on-call rotation to support production systems.

Required Skills and Experience:

3-6 years of experience in SRE, DevOps, or Cloud Infrastructure roles.
Strong experience with AWS services , including Lambda, Step Functions, EC2, CloudWatch, IAM, S3 , etc.
Proficiency in Kubernetes (EKS preferred) and Docker containerization.
Hands-on experience with GitLab CI/CD pipelines .
Programming/scripting knowledge in Python .
Solid understanding of monitoring, alerting, and observability tools-Sumo Logic preferred.
Strong problem-solving skills and the ability to work in a fast-paced agile environment.
Experience with infrastructure as code (IaC) and version control best practices.

Preferred Qualifications:

AWS certifications (e.g., AWS Certified DevOps Engineer, Solutions Architect )
Experience in hybrid cloud environments or enterprise-scale distributed systems
Familiarity with other observability tools like Datadog, Prometheus, or Grafana
Experience with incident management and SRE metrics (SLIs, SLOs, error budgets)