TEKsystems

Staff Engineer, SRE

TEKsystems, Atlanta, Georgia, United States, 30383

Overview

This customer is currently in their journey of establishing a SRE practice/platform. The goal is to build solid observability of the platform and evaluate the tool stack they will look to implement. They have a code team in place now and are looking to augment that team with a staff engineer. They need someone with industry knowledge in SRE — working with vendors, hands-on technical experience, technical guidance to more junior members of the team, and help lead migrations from one tool to another. As of now they do not need a SE or SA — they are focused on strategy and the primary tasks are dashboarding and setting up the tools.

Key Functions / Duties of Position

Define, and track reliability and observability OKRs, including Service Level Objectives (SLOs) and Service Level Indicators (SLIs).

Implement robust monitoring and alerting systems to proactively monitor health, identify potential issues, analyze system performance, and facilitate quick response to incidents.

Implement AIOps functionality to enable auto-response, self-healing, and anomaly trend analysis.

Drive the development and implementation of automation solutions to remove toil, streamline processes, reduce manual interventions, and enhance efficiency of the product engineering and SRE teams.

Identify and address performance bottlenecks in applications and infrastructure to improve efficiency and user experience.

Work closely with incident management to quickly address and resolve system outages or performance issues to minimize downtime and impact on users.

Collaborate with development and operations teams to implement observability and resiliency requirements for smooth deployment and operation of software systems.

Lead coordination with product, development, infrastructure, and architecture teams to conduct capacity planning and ensure systems can handle current and future demand.

Improve reliability by identifying and addressing gaps in architecture, services, and tooling.

Modernize disaster recovery programs for both on‑premise and cloud-based Berkley solutions.

Provide technical leadership and mentorship to other engineers, fostering a culture of learning and continuous improvement.

Education Requirement

Bachelor's degree in computer science, Information Technology, or a related field (or a combination of education and equivalent experience).

Qualifications

7+ years of IT experience working with infrastructure support and development

7+ years of experience in Site Reliability Engineering and DevOps

Proficient in scripting languages like Python, Go, Bash, and/or JavaScript, and experience with shell scripting

Strong expertise of observability, monitoring, alerting, and logging tools (Dynatrace, Datadog, ELK Stack)

Hands-on experience creating and implementing logging and monitoring architectures

Experience designing and implementing on‑premises, cloud, and hybrid resiliency solutions (HA, AA, AP), disaster recovery, and business continuity planning

Deep understanding of cloud computing principles (IaaS, PaaS, SaaS)

Experience with Kubernetes and auto-scaling tools; proficiency with Helm and Prometheus

Proficient in GitOps with containerization and CI/CD pipelines

Experience with automated system reliability and performance solutions (GitHub Actions, Terraform, Ansible, Chef, Puppet)

Solid understanding of security best practices in on‑premises, cloud, and hybrid environments

Ability to drive critical issues and system design discussions across multiple technology teams

Demonstrated leadership, mentoring junior engineers and leading technical projects

Strong problem-solving skills and ability to troubleshoot complex issues in distributed environments

Excellent communication skills to collaborate with cross-functional teams and convey technical concepts to non-technical stakeholders

Behavioral Core Competencies

Strategic

Influential

Organizational Navigation

Balanced Approach

Commandership Skills

Composure

Pay and Benefits The pay range for this position is $80.00 - $80.00/hr. Eligibility requirements apply to some benefits and may depend on your job classification and length of employment. Benefits are subject to change and may be subject to specific elections, plan, or program terms. If eligible, benefits for this temporary role may include:

Medical, dental & vision

Critical Illness, Accident, and Hospital coverage

401(k) Retirement Plan with pre-tax and Roth post-tax contributions

Life Insurance (Voluntary Life & AD&D for employee and dependents)

Short and long-term disability

Health Spending Account (HSA)

Transportation benefits

Employee Assistance Program

Time Off / PTO

Workplace Type This is a hybrid position in Atlanta, GA.

Application Deadline This position is anticipated to close on Oct 2, 2025.

About TEKsystems We are partners in transformation. We help clients activate ideas and solutions to take advantage of a new world of opportunity. We are a team of 80,000 strong, working with over 6,000 clients, including 80% of the Fortune 500, across North America, Europe and Asia. As an industry leader in Full-Stack Technology Services, Talent Services, and real-world application, we work with progressive leaders to drive change. TEKsystems is an Allegis Group company.

Equal Opportunity Employer The company is an equal opportunity employer and will consider all applications without regard to race, sex, age, color, religion, national origin, veteran status, disability, sexual orientation, gender identity, genetic information or any characteristic protected by law.

#J-18808-Ljbffr