TechDigital Group
Site Reliability Engineering (SRE) Consultant
TechDigital Group, Raleigh, North Carolina, United States
Role Summary
The SRE Consultant will serve as a strategic and technical leader, driving the transformation of production operations into a Site Reliability Engineering (SRE) model across our financial client ETO LoBs. This role blends engineering excellence with operational rigor, focusing on measurable outcomes such as reduced toil, improved system reliability, and accelerated incident resolution. The consultant will work with our financial client & Cognizant teams analyzing current state and come up with a tower specific detailed roadmap to transit the teams from Production Operations to SRE Model. Location: Charlotte, NC & work from our financial client Office per client policy (currently 3 days a week). No Travel option offered.
Key responsibilities
Need to work with our financial client Reliability Engineering (ARE) team and understand the hub's operations, and the specific needs of individual LOBs. They will develop a comprehensive view of how the SRE model can be defined and implemented, assess the existing SRE models within each LOB, and identify any gaps that need to be addressed.
Supporting LOBs in implementing and maturing SRE practices. They will conduct thorough analyses, propose viable options, and contribute to the adoption of the new operating model.
Define and give approach for SRE engagement models based on tower maturity and skill readiness.
Conduct maturity assessments across towers to identify gaps in SRE capabilities and recommend targeted interventions.
Map existing Production Operations functions and team members to SRE roles, identifying skill gaps and reskilling needs.
Develop and execute a tower-specific roadmap for SRE adoption, including coaching plans, tooling strategies, and governance models.
Provide a roadmap to transit from traditional Production Operations to an SRE-driven model, defining clear boundaries for retained operations and SRE ownership.
Production operations transition plan to the SRE model. This includes identifying functions that can remain in production operations and those that should transition to SRE. They will also evaluate team members for potential movement to SRE, assess their skills, and make decisions on retention and replacement based on the needs of each individual tower.
Qualifications
15+ years in software engineering or production operations, with 5+ years in SRE leadership.
Deep expertise in cloud platforms (AWS, Azure), observability tools (Dynatrace, Splunk, Datadog...), and automation frameworks (Terraform, Ansible).
Proven experience in leading SRE transformations across enterprise-scale environments.
Strong communication and presentation skills, with the ability to influence senior stakeholders.
#J-18808-Ljbffr
Key responsibilities
Need to work with our financial client Reliability Engineering (ARE) team and understand the hub's operations, and the specific needs of individual LOBs. They will develop a comprehensive view of how the SRE model can be defined and implemented, assess the existing SRE models within each LOB, and identify any gaps that need to be addressed.
Supporting LOBs in implementing and maturing SRE practices. They will conduct thorough analyses, propose viable options, and contribute to the adoption of the new operating model.
Define and give approach for SRE engagement models based on tower maturity and skill readiness.
Conduct maturity assessments across towers to identify gaps in SRE capabilities and recommend targeted interventions.
Map existing Production Operations functions and team members to SRE roles, identifying skill gaps and reskilling needs.
Develop and execute a tower-specific roadmap for SRE adoption, including coaching plans, tooling strategies, and governance models.
Provide a roadmap to transit from traditional Production Operations to an SRE-driven model, defining clear boundaries for retained operations and SRE ownership.
Production operations transition plan to the SRE model. This includes identifying functions that can remain in production operations and those that should transition to SRE. They will also evaluate team members for potential movement to SRE, assess their skills, and make decisions on retention and replacement based on the needs of each individual tower.
Qualifications
15+ years in software engineering or production operations, with 5+ years in SRE leadership.
Deep expertise in cloud platforms (AWS, Azure), observability tools (Dynatrace, Splunk, Datadog...), and automation frameworks (Terraform, Ansible).
Proven experience in leading SRE transformations across enterprise-scale environments.
Strong communication and presentation skills, with the ability to influence senior stakeholders.
#J-18808-Ljbffr