Utah Staffing
Senior Manager, Site Reliability Engineering
Utah Staffing, Salt Lake City, Utah, United States, 84193
Senior Manager Of Site Reliability Engineering (SRE)
If you're passionate about building a better future for individuals, communities, and our country-and you're committed to working hard to play your part in building that future-consider WGU as the next step in your career. Driven by a mission to expand access to higher education through online, competency-based degree programs, WGU is also committed to being a great place to work for a diverse workforce of student-focused professionals. The university has pioneered a new way to learn in the 21st century, one that has received praise from academic, industry, government, and media leaders. Whatever your role, working for WGU gives you a part to play in helping students graduate, creating a better tomorrow for themselves and their families. The salary range for this position takes into account the wide range of factors that are considered in making compensation decisions including but not limited to skill sets; experience and training; licensure and certifications; and other business and organizational needs. At WGU, it is not typical for an individual to be hired at or near the top of the range for their position, and compensation decisions are dependent on the facts and circumstances of each case. A reasonable estimate of the current range is: Grade: Management Technical 715 Pay Range: $ - $. The Senior Manager of Site Reliability Engineering (SRE) leads the function responsible for ensuring that critical systems and services are reliable, scalable, and resilient. The role combines technical leadership with organizational management, directing SRE teams in designing, implementing, and operating infrastructure that supports business needs. This position defines service reliability standards, drives incident response practices, oversees automation initiatives, and partners with other engineering and product teams to balance reliability with delivery velocity. This position's main objective is to improve reliability, performance, and operational efficiency to ensure our students and faculty are delighted with the fully online educational experience. Primary Responsibilities
Leads and mentors SRE teams, creating an environment that encourages ownership, collaboration, and continuous improvement. Establishes the SRE vision, goals, and operational strategies in alignment with organizational objectives. Defines reliability roadmaps and communicate priorities to engineering and executive stakeholders. Develops, drives, and supports service level objectives (SLOs), indicators (SLIs), and agreements (SLAs) across systems. Directs incident management processes, including response coordination, root cause analysis, and follow-up actions. Implements practices that reduce downtime and ensure systems meet availability, scalability, and performance expectations. Drives adoption of infrastructure as code, CI/CD pipelines, and automated testing to improve operational efficiency. Oversees monitoring, alerting, and observability systems that provide insight into service health. Evaluates and implements emerging tools that enhance service reliability and reduce manual toil. Collects and evaluates system and application data to improve the performance and reliability of the environment proactively. Partners with software engineering, security, and product teams to integrate reliability into all development lifecycle phases. Provides senior leadership and other stakeholders with transparent reporting on reliability trends, risks, and improvement initiatives. Western Governors University is an equal opportunity employer and does not discriminate on the basis of race, color, religion, sex, age, national origin, disability, veteran status, sexual orientation or any other classification protected by federal, state or local law.
If you're passionate about building a better future for individuals, communities, and our country-and you're committed to working hard to play your part in building that future-consider WGU as the next step in your career. Driven by a mission to expand access to higher education through online, competency-based degree programs, WGU is also committed to being a great place to work for a diverse workforce of student-focused professionals. The university has pioneered a new way to learn in the 21st century, one that has received praise from academic, industry, government, and media leaders. Whatever your role, working for WGU gives you a part to play in helping students graduate, creating a better tomorrow for themselves and their families. The salary range for this position takes into account the wide range of factors that are considered in making compensation decisions including but not limited to skill sets; experience and training; licensure and certifications; and other business and organizational needs. At WGU, it is not typical for an individual to be hired at or near the top of the range for their position, and compensation decisions are dependent on the facts and circumstances of each case. A reasonable estimate of the current range is: Grade: Management Technical 715 Pay Range: $ - $. The Senior Manager of Site Reliability Engineering (SRE) leads the function responsible for ensuring that critical systems and services are reliable, scalable, and resilient. The role combines technical leadership with organizational management, directing SRE teams in designing, implementing, and operating infrastructure that supports business needs. This position defines service reliability standards, drives incident response practices, oversees automation initiatives, and partners with other engineering and product teams to balance reliability with delivery velocity. This position's main objective is to improve reliability, performance, and operational efficiency to ensure our students and faculty are delighted with the fully online educational experience. Primary Responsibilities
Leads and mentors SRE teams, creating an environment that encourages ownership, collaboration, and continuous improvement. Establishes the SRE vision, goals, and operational strategies in alignment with organizational objectives. Defines reliability roadmaps and communicate priorities to engineering and executive stakeholders. Develops, drives, and supports service level objectives (SLOs), indicators (SLIs), and agreements (SLAs) across systems. Directs incident management processes, including response coordination, root cause analysis, and follow-up actions. Implements practices that reduce downtime and ensure systems meet availability, scalability, and performance expectations. Drives adoption of infrastructure as code, CI/CD pipelines, and automated testing to improve operational efficiency. Oversees monitoring, alerting, and observability systems that provide insight into service health. Evaluates and implements emerging tools that enhance service reliability and reduce manual toil. Collects and evaluates system and application data to improve the performance and reliability of the environment proactively. Partners with software engineering, security, and product teams to integrate reliability into all development lifecycle phases. Provides senior leadership and other stakeholders with transparent reporting on reliability trends, risks, and improvement initiatives. Western Governors University is an equal opportunity employer and does not discriminate on the basis of race, color, religion, sex, age, national origin, disability, veteran status, sexual orientation or any other classification protected by federal, state or local law.