Diligent Tec, Inc
Position
Incident & Request Manager - Non-Production Environments
Location Atlanta GA / Bellevue WA
Contract Duration 6 Months
Job Type Temporary Assignment
Work Type Onsite
Job Description The Incident & Request Manager leads the incident response and request management function for all non-production environments (Dev, QA, UAT, Performance). Acting as the escalation point for project/product delivery teams, this role ensures incidents are resolved quickly, requests are fulfilled efficiently, and learnings are embedded into continuous improvement.
Key Responsibilities
Own the incident lifecycle: detection, triage, response, resolution, and closure.
Act as the primary escalation point for project/product delivery teams during NPE incidents.
Lead war rooms for critical incidents, coordinating with technical and delivery stakeholders.
Ensure timely escalation to Environment, Change, DevOps, Infra, and Security teams when required.
Track and improve incident SLAs (MTTR, MTTD, availability SLOs).
Own request fulfilment for project/product delivery teams (e.g., access, entitlements, environment service requests).
Standardize and automate common request types in collaboration with Intake and DevOps teams.
Ensure requests are logged, prioritized, and fulfilled within SLA.
Provide transparency to stakeholders on request status.
Manage and mentor Incident Analysts and SREs.
Ensure follow-the-sun coverage via offshore/onshore teams.
Build a culture of blameless incident management, automation-first practices, and continuous learning.
Ensure all incidents have documented Root Cause Analysis (RCA).
Track corrective and preventive actions, and feed them into Change and Environment management processes.
Provide trend reporting and insights to leadership.
Work with SREs and DevOps teams to automate incident detection, rollback, and recovery.
Integrate observability tools (Splunk, Prometheus, Grafana) into proactive monitoring.
Provide timely updates during incidents and delays in request fulfilment.
Publish regular reports on incident trends, RCA outcomes, and SLA adherence.
Maintain trust with project/product delivery teams by ensuring transparent communication.
Required Skills & Experience
8-10 years in Incident Management, Service Operations, or SRE leadership.
Experience managing Incident Analysts and SRE teams.
Strong knowledge of AWS, Kubernetes, CI/CD pipelines, and observability tools (Splunk, Prometheus, Grafana).
Deep understanding of ITIL Incident, Problem, and Request Management processes.
Excellent crisis management, communication, and stakeholder engagement skills.
#J-18808-Ljbffr
Location Atlanta GA / Bellevue WA
Contract Duration 6 Months
Job Type Temporary Assignment
Work Type Onsite
Job Description The Incident & Request Manager leads the incident response and request management function for all non-production environments (Dev, QA, UAT, Performance). Acting as the escalation point for project/product delivery teams, this role ensures incidents are resolved quickly, requests are fulfilled efficiently, and learnings are embedded into continuous improvement.
Key Responsibilities
Own the incident lifecycle: detection, triage, response, resolution, and closure.
Act as the primary escalation point for project/product delivery teams during NPE incidents.
Lead war rooms for critical incidents, coordinating with technical and delivery stakeholders.
Ensure timely escalation to Environment, Change, DevOps, Infra, and Security teams when required.
Track and improve incident SLAs (MTTR, MTTD, availability SLOs).
Own request fulfilment for project/product delivery teams (e.g., access, entitlements, environment service requests).
Standardize and automate common request types in collaboration with Intake and DevOps teams.
Ensure requests are logged, prioritized, and fulfilled within SLA.
Provide transparency to stakeholders on request status.
Manage and mentor Incident Analysts and SREs.
Ensure follow-the-sun coverage via offshore/onshore teams.
Build a culture of blameless incident management, automation-first practices, and continuous learning.
Ensure all incidents have documented Root Cause Analysis (RCA).
Track corrective and preventive actions, and feed them into Change and Environment management processes.
Provide trend reporting and insights to leadership.
Work with SREs and DevOps teams to automate incident detection, rollback, and recovery.
Integrate observability tools (Splunk, Prometheus, Grafana) into proactive monitoring.
Provide timely updates during incidents and delays in request fulfilment.
Publish regular reports on incident trends, RCA outcomes, and SLA adherence.
Maintain trust with project/product delivery teams by ensuring transparent communication.
Required Skills & Experience
8-10 years in Incident Management, Service Operations, or SRE leadership.
Experience managing Incident Analysts and SRE teams.
Strong knowledge of AWS, Kubernetes, CI/CD pipelines, and observability tools (Splunk, Prometheus, Grafana).
Deep understanding of ITIL Incident, Problem, and Request Management processes.
Excellent crisis management, communication, and stakeholder engagement skills.
#J-18808-Ljbffr