New York Life

Senior Associate - DR Recovery Lead (IT Operations)

New York Life, New York, New York, us, 10261

Location Designation: Hybrid - 3 days per quarter

As part of Technology, you'll have the opportunity to contribute to groundbreaking initiatives that shape New York Life's digital landscape. Leverage cutting‑edge technologies like Generative AI to increase productivity, streamline processes, and create seamless experiences for clients, agents, and employees. Your expertise fuels innovation, agility, and growth — driving the company's success.

Role Summary New York Life is standing up a repeatable, automation‑first Disaster Recovery (DR) operating model to ensure we can sustain a Minimum Viable Company (MVC) and recover priority services within 48 hours. As the DR Recovery Lead (IT Ops), you will be the single‑threaded owner for day‑to‑day DR operations—driving orchestration execution, maintaining infra/app runbooks, coordinating cross‑tech teams and vendors, and ensuring audit‑ready evidence for quarterly exercises and an annual recovery test calendar. You’ll also align DR with enterprise architecture and regulatory standards and continuously improve our capabilities.

What You’ll Do

Own DR operations & runbooks:

Build, maintain, and continuously improve infrastructure and application recovery runbooks (including integrations and upstream/downstream interfaces) aligned to the enterprise DR framework and RACI.

Execute orchestrated recoveries:

Lead automation‑first recovery using IaC/pipelines and evidence harness to capture artifacts, health checks, and outcomes for audit.

Plan & run tests:

Lead quarterly tabletop/functional validations, drive an annual DR exercise calendar, and manage test evidence and acceptance with business owners.

Safeguard environments:

Monitor configuration parity and drift; ensure DR capacity/readiness across failover patterns; coordinate change windows with APSO/CAB.

Restore securely:

Coordinate restoration of IAM, keys/certs, and control re‑enablement in alignment with cyber‑incident procedures.

Recover data with integrity:

Partner with DBA/Data teams on backup/restore or replication, validation, and reconciliation steps.

Prove service health:

Define and run synthetic probes/SLIs/SLOs and publish dashboards to verify recoverability.

Manage vendors:

Orchestrate third‑party SLAs, negotiate test windows, and validate contractual obligations and evidence.

Map & prioritize services:

Maintain Critical Business Service (CBS) inventories and dependencies; scale playbooks across priority CBS.

Lead during incidents:

Serve as DR operations lead for activation, coordinating comms and cross‑tech execution through recovery.

Added Focus Areas

Architectural alignment:

Ensure DR strategies, patterns, and runbooks conform to enterprise architecture standards, reference architectures, and future‑state infrastructure plans; participate in design reviews and provide DR non‑functional requirements.

Multi‑cloud & cloud‑native DR:

Engineer and operate DR solutions across on‑prem and multi‑cloud environments (e.g., AWS/Azure), leveraging cloud‑native patterns such as active/active, regional failover, immutable infrastructure, and serverless recovery.

Regulatory & compliance:

Embed controls and evidence to meet NYDFS, SOX, GDPR, and related obligations; align to NIST (e.g., SP 800‑34/61) and ISO 22301 principles; maintain audit‑ready artifacts and traceability.

Continuous improvement & innovation:

Drive quarterly improvement backlogs; pilot emerging techniques (e.g., chaos engineering/game days, AI‑assisted recovery validation), retire manual steps, and report ROI.

Who You’ll Bring Enterprise Architecture (EA), Value Stream (VS) architects, App Owners/Devs, IT Ops, Security, DBA/Data, SRE/Observability, APSO/Change Management, and key vendors/third parties.

Qualifications

8+ years in IT Operations / SRE / DR or equivalent enterprise resiliency roles.

Hands‑on experience with DR patterns (active/active, active/passive), backup/restore & replication, and hybrid/multi‑cloud infrastructure.

Strong automation/IaC background (e.g., Terraform/CloudFormation), CI/CD pipelines, and scripting (PowerShell, Bash, or Python).

Proven test planning & execution (tabletops through functional validation) with rigorous evidence capture.

Familiarity with security control restoration (IAM, PKI, secrets) and alignment to cyber‑incident runbooks.

Observability expertise (health checks, synthetic probes, SLIs/SLOs, dashboards).

Effective vendor management, change/incident coordination (e.g., APSO/CAB), and cross‑functional facilitation in a RACI‑governed program.

Architecture & compliance literacy:

ability to interpret EA standards and reference architecture; working knowledge of NYDFS, SOX, GDPR, NIST, and ISO 22301 expectations as they relate to DR.

Excellent communication, leadership, and decision‑making under pressure.

Nice to Have

Financial services or other regulated‑industry experience.

Certifications: ITIL, AWS/Azure architect/ops, DRII/BCI, or security (e.g., CISSP, GCIH).

Experience with chaos engineering, game‑day design, and AI‑assisted testing or recovery validation.

Pay Transparency Salary Range: $108,500-$155,500

Overtime eligible: Exempt

Discretionary bonus eligible: Yes

Sales bonus eligible: No

Actual base salary will be determined based on several factors but not limited to individual’s experience, skills, qualifications, and job location. Additionally, employees are eligible for an annual discretionary bonus. In addition to base salary, employees may also be eligible to participate in an incentive program.

#J-18808-Ljbffr