Logo
TEKsystems

Site Reliability Engineer, SRE

TEKsystems, Atlanta, Georgia, United States, 30383

Save Job

Duration: 3 month w2 contract to hire Location: 4 days onsite & 1 day remote- Charlotte, NC or Atlanta, GA *Top Skills' Details* 1. 7+ years of experience within SRE. The hiring manager is more focused on SRE Practice (being able to bring the knowledge of production to me SLO's), less focused on the DevOps (more of a nice to have) 2. Solid scripting knowledge and experience within any of the following - Python, Go, Bash, Javascript, and Shell (does not need to be proficient in all of these, just very knowledgeable in at least one of them) *3. Main Tech Stack: Dynatrace, Datadog, ELK. Ansible experience is a nice to have as they are starting to utilize that* 4. SRE certifications are a requirement as well as a bachelor's degree (see education requirement in job description) *** Fintech experience is a very nice to have *** *Description* This customer is currently in their journey of establishing a SRE practice/platform. The goal is to build a solid observability of the platform and evaluate the tool stack they will look to implement. They have a code team in place now and are looking to augment that team with a staff engineer. What they need is someone with industry knowledge in SRE - Working with vendors, hands on technical experience, technical guidance to more junior members of team and help Lead migrations from one tool to another. *Key Functions/Duties of Position: * *Define, and track reliability and observability OKRs. This includes defining and tracking Service Level Objectives (SLOs) and Service Level Indicators (SLIs). *Implement robust monitoring and alerting systems to proactively monitor health, identify potential issues, analyze system performance, and facilitate quick response to incidents. *Implement AIOps functionality to enable auto-response, self-healing, and anomaly trend analysis. *Drive the development and implementation of automation solutions to remove "toil", streamline processes, reduce manual interventions, and enhance the overall efficiency of the product engineering and SRE teams. *Identifying and addressing performance bottlenecks in applications and infrastructure to improve efficiency and user experience. *Work closely with incident management to quickly address and resolve system outages or performance issues to minimize downtime and impact on users. *Collaborate actively with development and operations teams to implement observability and resiliency requirements in order to ensure smooth deployment and operation of software systems. *Lead the coordination with product, development, infrastructure, and architecture teams to conduct capacity planning, ensuring that systems can handle current and future demand; anticipate growth and scalability requirements. *Improve reliability by identifying and addressing gaps in our architecture, services, and tooling. *Modernize disaster recovery program for both on premise and Cloud-based Berkley solutions. *Provide technical leadership and mentorship to other engineers, fostering a culture of learning and continuous improvement. *Education Requirement* *Bachelor's degree in computer science, Information Technology, or a related field (or a combination of education and equivalent experience). *Qualifications:* *7+ years of IT experience working with infrastructure support and development *7+ years of experience of Site Reliability Engineering and DevOps. *Proficient in scripting languages like Python, Go, Bash, and/or Javascript, and experience with Shell Scripting. *Strong expertise of observability, monitoring, alerting, and logging tools (Dynatrace, Datadog, ELK Stack) *Practical expertise in creating and implementing logging and monitoring architectures through hands-on experience. *Expertise in designing and implementing on-premises, cloud, and hybrid resiliency solutions (HA, AA, AP), disaster recovery, and business continuity planning. *Deep understanding of cloud computing principles, including IaaS, PaaS, and SaaS models. *Experience with Kubernetes and other auto-scaling tools and technologies. Including proficiency with tools such as Helm and Prometheus for deployment and monitoring. *Proficient in leveraging GitOps with containerization technologies and CI/CD pipelines. *Develop and implement automated system reliability and performance solutions including infrastructure automation and configuration management tools (GitHub Actions, Terraform, Ansible, Chef, Puppet). *Solid understanding of security best practices in on-premises, cloud, and hybrid environments along with Network technologies. *Understanding of industry standard security frameworks and ability to interpret them for Berkley environments. *Ability to drive critical issues and system design discussions and moderate between multiple technology teams. *Demonstrated leadership experience, including mentoring junior engineers and leading technical projects. *Excellent problem-solving skills and the ability to troubleshoot complex issues in a distributed hybrid environment. *Strong communication skills to collaborate effectively with cross-functional teams and convey technical concepts to non-technical stakeholders. *Behavioral Core Competencies:* *Strategic *Influential *Organizational Navigation *Balanced Approach *Commandership Skills *Composure *Experience Level* Expert Level *Pay and Benefits* The pay range for this position is $73.68 - $73.68/hr. Eligibility requirements apply to some benefits and may depend on your job classification and length of employment. Benefits are subject to change and may be subject to specific elections, plan, or program terms. If eligible, the benefits available for this temporary role may include the following: * Medical, dental & vision * Critical Illness, Accident, and Hospital * 401(k) Retirement Plan - Pre-tax and Roth post-tax contributions available * Life Insurance (Voluntary Life & AD&D for the employee and dependents) * Short and long-term disability * Health Spending Account (HSA) * Transportation benefits * Employee Assistance Program * Time Off/Leave (PTO, Vacation or Sick Leave) *Workplace Type* This is a hybrid position in Atlanta,GA. *Application Deadline* This position is anticipated to close on Sep 16, 2025. h4>About TEKsystems:

We're partners in transformation. We help clients activate ideas and solutions to take advantage of a new world of opportunity. We are a team of 80,000 strong, working with over 6,000 clients, including 80% of the Fortune 500, across North America, Europe and Asia. As an industry leader in Full-Stack Technology Services, Talent Services, and real-world application, we work with progressive leaders to drive change. That's the power of true partnership. TEKsystems is an Allegis Group company.

The company is an equal opportunity employer and will consider all applications without regards to race, sex, age, color, religion, national origin, veteran status, disability, sexual orientation, gender identity, genetic information or any characteristic protected by law.

About TEKsystems and TEKsystems Global Services

We're a leading provider of business and technology services. We accelerate business transformation for our customers. Our expertise in strategy, design, execution and operations unlocks business value through a range of solutions. We're a team of 80,000 strong, working with over 6,000 customers, including 80% of the Fortune 500 across North America, Europe and Asia, who partner with us for our scale, full-stack capabilities and speed. We're strategic thinkers, hands-on collaborators, helping customers capitalize on change and master the momentum of technology. We're building tomorrow by delivering business outcomes and making positive impacts in our global communities. TEKsystems and TEKsystems Global Services are Allegis Group companies. Learn more at TEKsystems.com.

The company is an equal opportunity employer and will consider all applications without regard to race, sex, age, color, religion, national origin, veteran status, disability, sexual orientation, gender identity, genetic information or any characteristic protected by law.