Huntington National Bank
Site Reliability Engineer (SRE) – II
Huntington National Bank, Columbus, Ohio, United States, 43224
Overview
Join to apply for the
Site Reliability Engineer (SRE) – II
role at
Huntington National Bank . THIS ROLE DOES NOT SUPPORT SPONSORSHIP CANDIDATES Summary
As a Site Reliability Engineer (SRE) Level II, you will play a key role in maintaining the availability, scalability, and performance of critical infrastructure and services. You will be responsible for building and automating solutions that enhance system reliability and support continuous delivery. In this role, you will handle more complex operational tasks and incidents, provide mentorship to junior SREs, and collaborate with development teams to ensure systems are designed for reliability from the ground up. Responsibilities
Lead troubleshooting efforts for high-impact production issues, providing detailed root cause analysis (RCA) and preventative measures. Participate in on-call rotations, acting as an escalation point for Level 1 SREs during major incidents. Automation & Infrastructure as Code (IaC): Develop and maintain automation scripts and infrastructure using tools like Terraform, Ansible, or CloudFormation. Implement automation solutions to eliminate manual tasks and improve system reliability, scalability, and performance. Performance & scalability: Analyze system performance and recommend optimizations for scalability and reliability. Support capacity planning by monitoring system metrics, traffic patterns, and usage trends to predict future resource needs. Collaborate with software engineering teams to influence design of new services and applications, ensuring they are scalable, reliable, and resilient from the start. Contribute to architectural decisions to ensure fault tolerance, redundancy, and recovery. Monitoring & Observability: Build and maintain robust monitoring, alerting, and observability solutions to proactively detect and resolve issues before they impact end users. Optimize monitoring tools (e.g., Prometheus, Grafana, Datadog, Dynatrace) and build custom dashboards. Security & Compliance: Ensure systems and infrastructure are secure, compliant, and aligned with organizational policies and industry best practices. Assist with vulnerability management, system patching, and implementing security measures to protect service integrity and availability. Lead efforts to continuously improve operational processes, tools, and workflows. Implement and enforce best practices in deployment, monitoring, and incident management to reduce downtime. Qualifications
Basic Qualifications
Bachelor’s degree in computer science, Information Technology, or a related field, or equivalent work experience. 3 years of experience in site reliability engineering, DevOps, systems administration, or related roles. Proven track record of managing complex infrastructure, troubleshooting production issues, and optimizing system performance. Preferred Qualifications
Strong experience with Linux/Unix administration and scripting (e.g., Python, Bash, Go). 5 years of experience in site reliability engineering, DevOps, systems administration, or related roles. Deep understanding of cloud platforms (AWS, GCP, Azure) and related services (EC2, S3, Lambda, Kubernetes, etc.). Experience with containerization and orchestration (Docker, Kubernetes). Proficiency with monitoring and observability tools such as Dynatrace, Prometheus, Grafana, Datadog, ELK Stack, or similar. Strong networking fundamentals (DNS, HTTP, TCP/IP), load balancing, and CDNs. Experience with CI/CD tools (Jenkins, GitLab CI, CircleCI) and infrastructure automation (Terraform, Ansible, Puppet). Familiarity with distributed systems and microservices architecture. Excellent problem-solving and troubleshooting skills, especially in diagnosing production issues in high-scale environments. Experience working in a multi-platform environment and ability to balance development and support roles. Strong analytical and communication skills; strong interpersonal skills with focus on customer service. Additional Details
Exempt Status: Yes (not eligible for overtime pay) or No (eligible for overtime pay) — based on role specifics. Yes Workplace Type: Office; some roles may offer flexible work arrangements per Huntington’s policy. Equal Opportunity Employer; Tobacco-Free Hiring Practice. Note to Agency Recruiters: Huntington Bank will not pay a fee for unsolicited resumes. All unsolicited resumes sent to Huntington Bank colleagues will be considered Huntington Bank property. A valid Master Service Agreement and Statement of Work are required for agency consideration.
#J-18808-Ljbffr
Join to apply for the
Site Reliability Engineer (SRE) – II
role at
Huntington National Bank . THIS ROLE DOES NOT SUPPORT SPONSORSHIP CANDIDATES Summary
As a Site Reliability Engineer (SRE) Level II, you will play a key role in maintaining the availability, scalability, and performance of critical infrastructure and services. You will be responsible for building and automating solutions that enhance system reliability and support continuous delivery. In this role, you will handle more complex operational tasks and incidents, provide mentorship to junior SREs, and collaborate with development teams to ensure systems are designed for reliability from the ground up. Responsibilities
Lead troubleshooting efforts for high-impact production issues, providing detailed root cause analysis (RCA) and preventative measures. Participate in on-call rotations, acting as an escalation point for Level 1 SREs during major incidents. Automation & Infrastructure as Code (IaC): Develop and maintain automation scripts and infrastructure using tools like Terraform, Ansible, or CloudFormation. Implement automation solutions to eliminate manual tasks and improve system reliability, scalability, and performance. Performance & scalability: Analyze system performance and recommend optimizations for scalability and reliability. Support capacity planning by monitoring system metrics, traffic patterns, and usage trends to predict future resource needs. Collaborate with software engineering teams to influence design of new services and applications, ensuring they are scalable, reliable, and resilient from the start. Contribute to architectural decisions to ensure fault tolerance, redundancy, and recovery. Monitoring & Observability: Build and maintain robust monitoring, alerting, and observability solutions to proactively detect and resolve issues before they impact end users. Optimize monitoring tools (e.g., Prometheus, Grafana, Datadog, Dynatrace) and build custom dashboards. Security & Compliance: Ensure systems and infrastructure are secure, compliant, and aligned with organizational policies and industry best practices. Assist with vulnerability management, system patching, and implementing security measures to protect service integrity and availability. Lead efforts to continuously improve operational processes, tools, and workflows. Implement and enforce best practices in deployment, monitoring, and incident management to reduce downtime. Qualifications
Basic Qualifications
Bachelor’s degree in computer science, Information Technology, or a related field, or equivalent work experience. 3 years of experience in site reliability engineering, DevOps, systems administration, or related roles. Proven track record of managing complex infrastructure, troubleshooting production issues, and optimizing system performance. Preferred Qualifications
Strong experience with Linux/Unix administration and scripting (e.g., Python, Bash, Go). 5 years of experience in site reliability engineering, DevOps, systems administration, or related roles. Deep understanding of cloud platforms (AWS, GCP, Azure) and related services (EC2, S3, Lambda, Kubernetes, etc.). Experience with containerization and orchestration (Docker, Kubernetes). Proficiency with monitoring and observability tools such as Dynatrace, Prometheus, Grafana, Datadog, ELK Stack, or similar. Strong networking fundamentals (DNS, HTTP, TCP/IP), load balancing, and CDNs. Experience with CI/CD tools (Jenkins, GitLab CI, CircleCI) and infrastructure automation (Terraform, Ansible, Puppet). Familiarity with distributed systems and microservices architecture. Excellent problem-solving and troubleshooting skills, especially in diagnosing production issues in high-scale environments. Experience working in a multi-platform environment and ability to balance development and support roles. Strong analytical and communication skills; strong interpersonal skills with focus on customer service. Additional Details
Exempt Status: Yes (not eligible for overtime pay) or No (eligible for overtime pay) — based on role specifics. Yes Workplace Type: Office; some roles may offer flexible work arrangements per Huntington’s policy. Equal Opportunity Employer; Tobacco-Free Hiring Practice. Note to Agency Recruiters: Huntington Bank will not pay a fee for unsolicited resumes. All unsolicited resumes sent to Huntington Bank colleagues will be considered Huntington Bank property. A valid Master Service Agreement and Statement of Work are required for agency consideration.
#J-18808-Ljbffr