First Citizens Bank
Lead Software Engineer
at
First Citizens Bank
We are seeking an experienced SRE Lead to build and maintain reliable, scalable infrastructure supporting our data engineering platform. This lead role focuses on ensuring data systems' operational excellence, automation, cost optimization, and disaster recovery while mentoring a growing team.
Responsibilities
Infrastructure & Platform Reliability
Maintain highly available, fault-tolerant data platforms on Snowflake, dbt Cloud, and AWS; establish SLOs/SLAs and implement monitoring to meet them.
Own incident response processes, post‑mortem culture, and continuous improvement; reduce MTTR through root‑cause analysis and preventative measures.
Observability & Monitoring
Implement comprehensive monitoring, alerting, and logging across data infrastructure using tools like Splunk, Dynatrace or similar, and design dashboards for real‑time visibility into system health.
Maintain dbt Cloud jobs, Airflow DAGs, and Snowflake performance.
Design anomaly detection and proactive alerting to prevent data incidents before they impact users.
Infrastructure as Code & Automation
Lead IaC initiatives using Terraform for AWS resources, Snowflake provisioning, and dbt Cloud configuration.
Manage deployment pipelines, scaling policies, and resource provisioning to reduce manual toil.
Build self‑service tools and runbooks enabling engineers to safely operate infrastructure.
Cost Optimization & Resource Management
Conduct regular cost audits; optimize Snowflake warehouse sizing, query performance, and cluster configurations; implement auto‑suspend/auto‑resume policies.
Monitor cloud resource utilization across compute, storage, and data transfer; identify cost‑saving opportunities and implement chargeback models.
Balance performance with cost through intelligent caching, compression, materialized views, and query optimization recommendations.
Security, Governance & Compliance
Enforce RBAC/ABAC policies, network segmentation, and encryption at rest and in transit; manage secrets, API keys, and credentials.
Take part in security reviews, penetration testing, and threat modeling; maintain disaster recovery and business continuity plans.
Team Leadership & Knowledge Sharing
Lead and mentor junior SREs; establish technical standards, best practices, and on‑call rotations.
Drive documentation culture; maintain runbooks, architecture diagrams, and troubleshooting guides for operational knowledge transfer.
Collaborate with data engineers on reliability concerns; advise on architecture decisions with production readiness in mind.
Capacity Planning & Disaster Recovery
Forecast infrastructure capacity needs; plan for growth and resource scaling aligned with business requirements.
Regularly test disaster recovery procedures; maintain backups, perform failover drills, and document recovery time objectives (RTOs) and recovery point objectives (RPOs).
Qualifications
Bachelor's Degree and 6 years of experience in Software application development and maintenance OR High School Diploma or GED and 10 years of experience in Software application development and maintenance
Snowflake Platform (deep operational knowledge: warehouses, clustering, query optimization, costs)
Infrastructure as Code: Terraform, CloudFormation, or similar (AWS‑focused preferred)
Data orchestration: Airflow, Dagster, dbt Cloud operational patterns
Observability tools: Splunk, Dynatrace, Datadog, CloudWatch, Prometheus/Grafana, or equivalent
CI/CD & Git workflows: GitHub, GitLab, AZDO or similar
AWS services (Data): EC2, S3, Glue, Lambda, RDS, networking, and cost management
Linux/Unix system administration and troubleshooting
Python or SQL for automation and tooling
Incident management and post‑mortem discipline
Core Competencies
Systems thinking and holistic problem‑solving
Strong communication and cross‑functional collaboration
Technical depth with operational breadth
Proactive mindset: anticipate failure, prevent incidents, improve continuously
Comfort with on‑call responsibilities and urgent troubleshooting
Ability to balance automation ROI with immediate operational needs
Mentoring and team building capabilities
Experience Required
7+ years in SRE, DevOps, or Infrastructure Engineering
3+ years in a lead or senior technical role
2+ years supporting data platforms or analytics infrastructure (Snowflake, dbt, data warehouses)
Benefits First Citizens Bank is committed to providing a competitive, thoughtfully designed benefits program to meet the needs of our associates. More information can be found at https://jobs.firstcitizens.com/benefits.
Job Details
Seniority level: Not Applicable
Employment type: Full‑time
Job function: Information Technology
Industry: Banking and Financial Services
#J-18808-Ljbffr
at
First Citizens Bank
We are seeking an experienced SRE Lead to build and maintain reliable, scalable infrastructure supporting our data engineering platform. This lead role focuses on ensuring data systems' operational excellence, automation, cost optimization, and disaster recovery while mentoring a growing team.
Responsibilities
Infrastructure & Platform Reliability
Maintain highly available, fault-tolerant data platforms on Snowflake, dbt Cloud, and AWS; establish SLOs/SLAs and implement monitoring to meet them.
Own incident response processes, post‑mortem culture, and continuous improvement; reduce MTTR through root‑cause analysis and preventative measures.
Observability & Monitoring
Implement comprehensive monitoring, alerting, and logging across data infrastructure using tools like Splunk, Dynatrace or similar, and design dashboards for real‑time visibility into system health.
Maintain dbt Cloud jobs, Airflow DAGs, and Snowflake performance.
Design anomaly detection and proactive alerting to prevent data incidents before they impact users.
Infrastructure as Code & Automation
Lead IaC initiatives using Terraform for AWS resources, Snowflake provisioning, and dbt Cloud configuration.
Manage deployment pipelines, scaling policies, and resource provisioning to reduce manual toil.
Build self‑service tools and runbooks enabling engineers to safely operate infrastructure.
Cost Optimization & Resource Management
Conduct regular cost audits; optimize Snowflake warehouse sizing, query performance, and cluster configurations; implement auto‑suspend/auto‑resume policies.
Monitor cloud resource utilization across compute, storage, and data transfer; identify cost‑saving opportunities and implement chargeback models.
Balance performance with cost through intelligent caching, compression, materialized views, and query optimization recommendations.
Security, Governance & Compliance
Enforce RBAC/ABAC policies, network segmentation, and encryption at rest and in transit; manage secrets, API keys, and credentials.
Take part in security reviews, penetration testing, and threat modeling; maintain disaster recovery and business continuity plans.
Team Leadership & Knowledge Sharing
Lead and mentor junior SREs; establish technical standards, best practices, and on‑call rotations.
Drive documentation culture; maintain runbooks, architecture diagrams, and troubleshooting guides for operational knowledge transfer.
Collaborate with data engineers on reliability concerns; advise on architecture decisions with production readiness in mind.
Capacity Planning & Disaster Recovery
Forecast infrastructure capacity needs; plan for growth and resource scaling aligned with business requirements.
Regularly test disaster recovery procedures; maintain backups, perform failover drills, and document recovery time objectives (RTOs) and recovery point objectives (RPOs).
Qualifications
Bachelor's Degree and 6 years of experience in Software application development and maintenance OR High School Diploma or GED and 10 years of experience in Software application development and maintenance
Snowflake Platform (deep operational knowledge: warehouses, clustering, query optimization, costs)
Infrastructure as Code: Terraform, CloudFormation, or similar (AWS‑focused preferred)
Data orchestration: Airflow, Dagster, dbt Cloud operational patterns
Observability tools: Splunk, Dynatrace, Datadog, CloudWatch, Prometheus/Grafana, or equivalent
CI/CD & Git workflows: GitHub, GitLab, AZDO or similar
AWS services (Data): EC2, S3, Glue, Lambda, RDS, networking, and cost management
Linux/Unix system administration and troubleshooting
Python or SQL for automation and tooling
Incident management and post‑mortem discipline
Core Competencies
Systems thinking and holistic problem‑solving
Strong communication and cross‑functional collaboration
Technical depth with operational breadth
Proactive mindset: anticipate failure, prevent incidents, improve continuously
Comfort with on‑call responsibilities and urgent troubleshooting
Ability to balance automation ROI with immediate operational needs
Mentoring and team building capabilities
Experience Required
7+ years in SRE, DevOps, or Infrastructure Engineering
3+ years in a lead or senior technical role
2+ years supporting data platforms or analytics infrastructure (Snowflake, dbt, data warehouses)
Benefits First Citizens Bank is committed to providing a competitive, thoughtfully designed benefits program to meet the needs of our associates. More information can be found at https://jobs.firstcitizens.com/benefits.
Job Details
Seniority level: Not Applicable
Employment type: Full‑time
Job function: Information Technology
Industry: Banking and Financial Services
#J-18808-Ljbffr