Logo
IBM

Program Director, SRE

IBM, Austin, Texas, us, 78716

Save Job

Join to apply for the

Program Director, SRE

role at

IBM

Join to apply for the

Program Director, SRE

role at

IBM

Introduction A career in IBM Software means you’ll be part of a team that transforms our customer’s challenges into solutions.

Seeking new possibilities and always staying curious, we are a team dedicated to creating the world’s leading AI-powered, cloud-native software solutions for our customers. Our renowned legacy creates endless global opportunities for our IBMers, so the door is always open for those who want to grow their career. IBM’s product and technology landscape includes Research, Software, and Infrastructure. Entering this domain positions you at the heart of IBM, where growth and innovation thrive.

Your Role And Responsibilities As a Site Reliability Engineering (SRE) Program Director, you will play a pivotal role in leading and driving the SRE program within our organization. You will be responsible for ensuring the reliability, scalability, and performance of systems and applications which support IBM Software SaaS offerings. The successful candidate will have a strong technical background, exceptional leadership skills, and a proven track record of implementing and optimizing SRE best practices in SaaS environments.

Key Responsibilities

Lead the SRE program strategy and execution across multiple SaaS offerings

Drive reliability engineering practices to ensure high availability and performance of services

Collaborate with engineering, product, and operations teams to embed SRE principles into the software development lifecycle

Oversee incident management processes, including root cause analysis and continuous improvement

Champion automation, observability, and proactive monitoring across systems

Guide the adoption of container orchestration and infrastructure-as-code practices

Mentor and grow a high-performing, globally distributed SRE team

Preferred Education Bachelor's Degree

Required Technical And Professional Expertise

Proven experience in a leadership role within Site Reliability Engineering or Development, with a focus on supporting SaaS and/or PaaS solutions

Proficient understanding of cloud computing platforms (e.g., IBM Cloud, AWS, Azure, GCP) and infrastructure as code

Strong experience with incident management, post-incident analysis, and root cause analysis in a multi-tenant SaaS context

In-depth knowledge of system architecture, networking, and security principles

Expertise in implementing and managing container orchestration platforms (e.g., Kubernetes) for multi-tenant environments

Preferred Technical And Professional Experience

Certification in Site Reliability Engineering or related field

Excellent communication skills and the ability to collaborate effectively with cross-functional teams

Demonstrated success in leading SRE transformations within organizations, particularly in the context of SaaS platforms

Seniority level Director

Employment type Full-time

Job function Business Development and Sales

Industries IT Services and IT Consulting

#J-18808-Ljbffr