TEKsystems

Senior Site Reliability Engineer

TEKsystems, Atlanta, Georgia, United States, 30383

Overview

This is a contract role (3 month W2 contract to hire) for a Senior Site Reliability Engineer to augment an existing SRE team. The goal is to establish a solid observability platform, guide tool selection, and lead migrations with vendor interaction, mentoring, and hands-on technical leadership. Pay and Benefits

Base pay range: $75.00/hr - $85.00/hr Pay and benefits details are described in the job post and are subject to eligibility and plan terms. Location and Schedule

Location: Charlotte, NC or Atlanta, GA — hybrid (4 days onsite, 1 day remote). Application deadline: Oct 24, 2025. Top Skills and Experience

7+ years of experience within Site Reliability Engineering (SRE). Focus on SRE practice, production knowledge, and SLO/SLI leadership; DevOps experience is a nice-to-have. Scripting: proficient in at least one of Python, Go, Bash, JavaScript, or Shell. Main tech stack: Dynatrace, Datadog, ELK. Ansible experience is a plus. SRE certifications and a bachelor’s degree (or equivalent experience) are required. Fintech experience is highly desirable. Responsibilities

Define and track reliability and observability OKRs, including SLOs and SLIs. Implement robust monitoring and alerting to proactively identify issues and support incident response. Enable AIOps capabilities for auto-response, self-healing, and anomaly trend analysis. Develop and implement automation to reduce toil and improve efficiency across product engineering and SRE teams. Identify and address performance bottlenecks in applications and infrastructure. Collaborate with incident management to minimize downtime and impact on users. Work with development and operations teams to embed observability and resiliency in deployment and operation. Lead capacity planning with product, development, infrastructure, and architecture teams to support current and future demand. Improve reliability by addressing gaps in architecture, services, and tooling. Modernize disaster recovery for on-premises and cloud-based solutions. Provide technical leadership and mentorship to other engineers. Education and Qualifications

Bachelor’s degree in computer science, Information Technology, or related field, or equivalent experience. 7+ years IT experience in infrastructure support and development. 7+ years in Site Reliability Engineering and DevOps. Strong expertise in observability, monitoring, alerting, and logging tools (Dynatrace, Datadog, ELK). Experience designing on-premises, cloud, and hybrid resiliency, disaster recovery, and business continuity planning. Cloud computing knowledge (IaaS, PaaS, SaaS). Experience with Kubernetes, Helm, Prometheus; GitOps with containerization and CI/CD pipelines. Automation and configuration management tools (GitHub Actions, Terraform, Ansible, Chef, Puppet). Security best practices in hybrid environments and ability to interpret security frameworks. Strong leadership, mentoring, and cross-functional collaboration skills. Behavioral Competencies

Strategic thinking Influence and organizational navigation Collaborative and communicative Problem-solving and adaptability Work Type

Employment type: Contract Experience level: Expert Workplace type: Hybrid (Atlanta, GA) Company

About TEKsystems: We’re a leading provider of business and technology services, with global reach and a strong commitment to equal opportunity employment.

#J-18808-Ljbffr