Compunnel, Inc.

Incident And Request Manager

Compunnel, Inc., Atlanta, Georgia, United States, 30383

The Incident & Request Manager is responsible for leading incident response and request management across all non-production environments (Dev, QA, UAT, Performance). This role serves as the escalation point for project and product delivery teams, ensuring incidents are resolved quickly, requests are fulfilled efficiently, and improvements are embedded into processes. The manager oversees a team of Incident Analysts and SREs, collaborates with DevOps teams to automate detection and response, and partners with Environment and Change Managers to reduce recurring issues. Key Responsibilities

Own the incident lifecycle: detection, triage, response, resolution, and closure. Act as the primary escalation point for project/product delivery teams during incidents in non-production environments. Lead war rooms for critical incidents, coordinating with technical and delivery stakeholders. Escalate to Environment, Change, DevOps, Infra, and Security teams when needed. Track and improve incident SLAs including MTTR, MTTD, and availability SLOs. Request Management

Own request fulfilment for project/product delivery teams (e.g., access, entitlements, environment service requests). Standardize and automate common request types with Intake and DevOps teams. Ensure requests are logged, prioritized, and fulfilled within SLA. Provide visibility to stakeholders on request status. Manage and mentor Incident Analysts and SREs. Ensure follow-the-sun coverage via offshore/onshore models. Build a culture of blameless incident management, automation-first practices, and continuous learning. Governance & RCA

Ensure all incidents have documented Root Cause Analysis (RCA). Track corrective and preventive actions, feeding them into Change and Environment management processes. Provide trend reporting and insights to leadership. SRE & DevOps Alignment

Collaborate with SREs and DevOps to automate incident detection, rollback, and recovery. Integrate observability tools (Splunk, Prometheus, Grafana) into proactive monitoring. Stakeholder Communication

Deliver timely updates during incidents and delays in request fulfilment. Publish regular reports on incident trends, RCA outcomes, and SLA adherence. Build trust with delivery teams through transparent communication. Project & Delivery Management

Define activities, responsibilities, milestones, resources, and budgets while ensuring scope, time, and cost compliance. Anticipate risks and apply suitable mitigation and contingency strategies. Track project KPIs such as schedule adherence, cost utilization, quality metrics, and risk index. Implement governance models and software delivery methodologies to improve project KPIs. Drive scope management, estimation, resource planning, risk/issue management, and stakeholder engagement. Oversee testing, defect triage, and RCA processes while ensuring adoption of best practices and reusable assets. Support account management processes and contribute to solution structuring for client engagements. Promote team development, knowledge sharing, and organizational initiatives. Required Qualifications

8–10 years of experience in Incident Management, Service Operations, or SRE leadership. Proven experience managing Incident Analysts and SRE teams. Strong knowledge of AWS, Kubernetes, CI/CD pipelines, and observability tools (Splunk, Prometheus, Grafana). Solid understanding of ITIL Incident, Problem, and Request Management processes. Excellent crisis management, communication, and stakeholder engagement skills. Proficiency in project management methodologies (Agile/Waterfall), estimation techniques, risk management, and metrics analysis. Preferred Qualifications

Experience in project or program management for IT service delivery. Hands-on experience driving automation-first practices in incident and request management. Exposure to account management processes, solution structuring, and client-facing delivery roles.

#J-18808-Ljbffr