Microsoft
Senior Lead Incident Manager – Site Reliability Engineer - CTJ - Poly
Microsoft, Reston, Virginia, United States, 22090
Senior Lead Incident Manager – Site Reliability Engineer - CTJ - Poly
Join to apply for the
Senior Lead Incident Manager – Site Reliability Engineer - CTJ - Poly
role at
Microsoft .
The Azure Senior Incident Manager - Site Reliability Engineer is responsible for driving the resolution of complex, multi-service outages across Azure’s global infrastructure in our Air Gap Clouds. This role provides operational leadership during high-severity incidents, ensuring timely mitigation, clear stakeholder communication, and adherence to compliance and privacy standards. The position requires technical breadth, demonstrated leadership under pressure, and the ability to coordinate across engineering, operations, and customer-facing teams.
Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
Responsibilities
Command & Control: Act as the primary incident commander for major Azure outages, ensuring forward progress and clarity throughout the incident lifecycle.
Incident Leadership: Lead cross-functional teams (engineering, support, operations) to restore services quickly and minimize customer impact.
Provide timely, accurate updates to executives, internal stakeholders, and customer-facing teams.
Process Governance: Ensure adherence to incident management protocols, including legal, privacy, and compliance requirements.
Continuous Improvement: Conduct Post-Incident Reviews (PIRs), identify systemic issues, and drive platform improvements.
Tooling & Automation: Leverage and enhance incident management tools such as Outage Hub and IcM for real-time visibility and coordination.
Mentorship: Guide and coach other incident managers and engineers on best practices for incident response.
Rythm of Business: Ensure our Executive Leaders receive regular updates, critical signals and progress reports on cloud-wide initiatives.
Embody our culture and values.
Required / Minimum Qualifications
Master’s Degree in Computer Science, Information Technology, or related field AND 1+ year(s) technical experience in software engineering, network engineering, or systems administration
or Bachelor’s Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration
or equivalent experience.
Preferred / Additional Qualifications
Leadership: Proven ability to lead global, distributed teams during high-pressure situations.
Innovation: Track record of implementing automation and process improvements in incident management.
Security Clearance Requirements
Active U.S. Government Top Secret Clearance with access to Sensitive Compartmented Information (SCI) based on a Single Scope Background Investigation (SSBI) with Polygraph.
Verification of clearance required prior to offer.
Microsoft Cloud background check upon hire and every two years thereafter.
U.S. citizenship verification required.
Compensation & Benefits Site Reliability Engineering IC3: Base pay range USD $100,600 – $199,000 per year across the U.S.; in the San Francisco Bay Area and New York City metropolitan area, base pay range USD $131,400 – $215,400 per year. Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay.
Microsoft will accept applications for the role until October 27, 2025.
Equal Opportunity Employer Microsoft is an equal‑opportunity employer. Consistent with applicable law, all qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.
#J-18808-Ljbffr
Senior Lead Incident Manager – Site Reliability Engineer - CTJ - Poly
role at
Microsoft .
The Azure Senior Incident Manager - Site Reliability Engineer is responsible for driving the resolution of complex, multi-service outages across Azure’s global infrastructure in our Air Gap Clouds. This role provides operational leadership during high-severity incidents, ensuring timely mitigation, clear stakeholder communication, and adherence to compliance and privacy standards. The position requires technical breadth, demonstrated leadership under pressure, and the ability to coordinate across engineering, operations, and customer-facing teams.
Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
Responsibilities
Command & Control: Act as the primary incident commander for major Azure outages, ensuring forward progress and clarity throughout the incident lifecycle.
Incident Leadership: Lead cross-functional teams (engineering, support, operations) to restore services quickly and minimize customer impact.
Provide timely, accurate updates to executives, internal stakeholders, and customer-facing teams.
Process Governance: Ensure adherence to incident management protocols, including legal, privacy, and compliance requirements.
Continuous Improvement: Conduct Post-Incident Reviews (PIRs), identify systemic issues, and drive platform improvements.
Tooling & Automation: Leverage and enhance incident management tools such as Outage Hub and IcM for real-time visibility and coordination.
Mentorship: Guide and coach other incident managers and engineers on best practices for incident response.
Rythm of Business: Ensure our Executive Leaders receive regular updates, critical signals and progress reports on cloud-wide initiatives.
Embody our culture and values.
Required / Minimum Qualifications
Master’s Degree in Computer Science, Information Technology, or related field AND 1+ year(s) technical experience in software engineering, network engineering, or systems administration
or Bachelor’s Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration
or equivalent experience.
Preferred / Additional Qualifications
Leadership: Proven ability to lead global, distributed teams during high-pressure situations.
Innovation: Track record of implementing automation and process improvements in incident management.
Security Clearance Requirements
Active U.S. Government Top Secret Clearance with access to Sensitive Compartmented Information (SCI) based on a Single Scope Background Investigation (SSBI) with Polygraph.
Verification of clearance required prior to offer.
Microsoft Cloud background check upon hire and every two years thereafter.
U.S. citizenship verification required.
Compensation & Benefits Site Reliability Engineering IC3: Base pay range USD $100,600 – $199,000 per year across the U.S.; in the San Francisco Bay Area and New York City metropolitan area, base pay range USD $131,400 – $215,400 per year. Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay.
Microsoft will accept applications for the role until October 27, 2025.
Equal Opportunity Employer Microsoft is an equal‑opportunity employer. Consistent with applicable law, all qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.
#J-18808-Ljbffr