The Hartford
Staff Reliability Engineer – Application Owner
The Hartford, Chicago, Illinois, United States, 60290
Overview
Staff Reliability Engineer – Application Owner role at The Hartford. Responsible for end-to-end reliability, performance, and lifecycle management of critical applications within the Claims and Operations IT ecosystem. Collaborate with engineering, infrastructure, and business teams to ensure systems are resilient and continuously improving. Responsibilities
Application Ownership: serve as the technical owner for one or more applications, ensuring reliability, scalability, and performance; drive adoption of observability, automation, and incident prevention; ensure compliance with enterprise architecture, security, and regulatory standards. Reliability Engineering: design and implement automation to reduce manual toil; build and maintain monitoring, alerting, and self-healing capabilities; lead root cause analysis and implement long-term fixes for recurring issues. DevSecOps & CI/CD: collaborate with DevOps teams to enhance CI/CD pipelines for secure and efficient deployments; integrate security and compliance checks into the software delivery lifecycle; promote infrastructure-as-code practices using Terraform or CloudFormation. Incident & Problem Management: lead triage and resolution of high-severity incidents; improve incident response processes and reduce mean time to recovery; maintain documentation, runbooks, and operational metadata. Collaboration & Influence: partner with development, QA, and infrastructure teams to drive reliability initiatives; contribute to the Reliability Engineering Community of Practice; mentor junior engineers and promote continuous improvement. Qualifications
Technical Expertise 7+ years of experience in software engineering, SRE, or application support; strong knowledge of AWS services (EC2, Lambda, S3, CloudWatch, IAM); scripting and automation (Python, NodeJS, Bash, PowerShell); observability tools (Dynatrace, Splunk, Prometheus, Grafana); CI/CD tools (Jenkins, GitHub Actions, Azure DevOps); containerization (Docker, Kubernetes, ECS/EKS); infrastructure-as-code (Terraform, CloudFormation). Operational Excellence Proven ability to lead incident response and root cause analysis; experience implementing SLIs, SLOs, and SLAs; ability to design and implement runbooks and automated health checks. Security & Compliance Understanding of DevSecOps principles and secure software delivery; familiarity with compliance frameworks (SOC2, HIPAA, PCI-DSS). Collaboration & Communication Strong cross-functional collaboration and communication skills; ability to explain technical concepts to non-technical stakeholders; experience mentoring or leading technical discussions. Preferred Qualifications AWS Certified DevOps Engineer, CKA, or Google SRE certification; experience in financial services or insurance, especially contact center or claims operations; exposure to hybrid cloud environments and legacy modernization. Why Join Us?
This role is part of a strategic transformation to embed Reliability Engineering across Claims and Operations IT. You’ll help modernize systems, improve customer experience, and drive operational excellence. Location & Schedule
This role has a hybrid schedule with in-office work 3 days a week (Tuesday–Thursday) in Columbus, OH; Chicago, IL; Hartford, CT; or Charlotte, NC. Candidates must be authorized to work in the US without company sponsorship. Compensation
The listed base pay range is $126,160 - $189,240 and may vary based on factors including performance and competencies. The Hartford offers additional rewards as part of total compensation.
#J-18808-Ljbffr
Staff Reliability Engineer – Application Owner role at The Hartford. Responsible for end-to-end reliability, performance, and lifecycle management of critical applications within the Claims and Operations IT ecosystem. Collaborate with engineering, infrastructure, and business teams to ensure systems are resilient and continuously improving. Responsibilities
Application Ownership: serve as the technical owner for one or more applications, ensuring reliability, scalability, and performance; drive adoption of observability, automation, and incident prevention; ensure compliance with enterprise architecture, security, and regulatory standards. Reliability Engineering: design and implement automation to reduce manual toil; build and maintain monitoring, alerting, and self-healing capabilities; lead root cause analysis and implement long-term fixes for recurring issues. DevSecOps & CI/CD: collaborate with DevOps teams to enhance CI/CD pipelines for secure and efficient deployments; integrate security and compliance checks into the software delivery lifecycle; promote infrastructure-as-code practices using Terraform or CloudFormation. Incident & Problem Management: lead triage and resolution of high-severity incidents; improve incident response processes and reduce mean time to recovery; maintain documentation, runbooks, and operational metadata. Collaboration & Influence: partner with development, QA, and infrastructure teams to drive reliability initiatives; contribute to the Reliability Engineering Community of Practice; mentor junior engineers and promote continuous improvement. Qualifications
Technical Expertise 7+ years of experience in software engineering, SRE, or application support; strong knowledge of AWS services (EC2, Lambda, S3, CloudWatch, IAM); scripting and automation (Python, NodeJS, Bash, PowerShell); observability tools (Dynatrace, Splunk, Prometheus, Grafana); CI/CD tools (Jenkins, GitHub Actions, Azure DevOps); containerization (Docker, Kubernetes, ECS/EKS); infrastructure-as-code (Terraform, CloudFormation). Operational Excellence Proven ability to lead incident response and root cause analysis; experience implementing SLIs, SLOs, and SLAs; ability to design and implement runbooks and automated health checks. Security & Compliance Understanding of DevSecOps principles and secure software delivery; familiarity with compliance frameworks (SOC2, HIPAA, PCI-DSS). Collaboration & Communication Strong cross-functional collaboration and communication skills; ability to explain technical concepts to non-technical stakeholders; experience mentoring or leading technical discussions. Preferred Qualifications AWS Certified DevOps Engineer, CKA, or Google SRE certification; experience in financial services or insurance, especially contact center or claims operations; exposure to hybrid cloud environments and legacy modernization. Why Join Us?
This role is part of a strategic transformation to embed Reliability Engineering across Claims and Operations IT. You’ll help modernize systems, improve customer experience, and drive operational excellence. Location & Schedule
This role has a hybrid schedule with in-office work 3 days a week (Tuesday–Thursday) in Columbus, OH; Chicago, IL; Hartford, CT; or Charlotte, NC. Candidates must be authorized to work in the US without company sponsorship. Compensation
The listed base pay range is $126,160 - $189,240 and may vary based on factors including performance and competencies. The Hartford offers additional rewards as part of total compensation.
#J-18808-Ljbffr