Jobs via Dice
Site Reliability Engineering Lead - Atlanta, GA
Jobs via Dice, Georgia Center, Vermont, United States
Site Reliability Engineering Lead - Atlanta, GA
Location: Atlanta, GA (Day 1 Hybrid); Hybrid: 2 days in a week; W2 candidates only; No C2C
Overview Client: QTech US Inc. Position: Site Reliability Engineering Lead. This role focuses on defining SRE practices, observability, and reliability improvements for AWS-based architectures.
Responsibilities
Work with team to define SRE maturity model, observability strategy, identify gaps and AWS reliability roadmap.
Translate business SLAs into SLIs/SLOs/Error Budgets.
Lead and implement AWS serverless reliability architecture (multi-region, failover, self-healing).
Define observability blueprints (logs, metrics, traces, UX telemetry).
Define cost optimized Data Observability and Resiliency solutions.
Design and implement fault-tolerant, highly available AWS architectures.
Experience in DynamoDB global tables, RDS failovers, capacity planning.
Apply SRE principles: SLIs, SLOs, SLAs, error budgets, and toil reduction.
Drive chaos engineering, disaster recovery, and capacity planning exercises.
Qualifications
Senior/lead level experience in Site Reliability Engineering or related field.
Strong understanding of AWS architecture, serverless design, and reliability patterns.
Experience with observability tooling and bringing SLIs/SLOs to life.
Job Function
Engineering and Information Technology
Industries
Software Development
Note: This posting contains standard job details and does not include external tracking links or site-only notices.
#J-18808-Ljbffr
Overview Client: QTech US Inc. Position: Site Reliability Engineering Lead. This role focuses on defining SRE practices, observability, and reliability improvements for AWS-based architectures.
Responsibilities
Work with team to define SRE maturity model, observability strategy, identify gaps and AWS reliability roadmap.
Translate business SLAs into SLIs/SLOs/Error Budgets.
Lead and implement AWS serverless reliability architecture (multi-region, failover, self-healing).
Define observability blueprints (logs, metrics, traces, UX telemetry).
Define cost optimized Data Observability and Resiliency solutions.
Design and implement fault-tolerant, highly available AWS architectures.
Experience in DynamoDB global tables, RDS failovers, capacity planning.
Apply SRE principles: SLIs, SLOs, SLAs, error budgets, and toil reduction.
Drive chaos engineering, disaster recovery, and capacity planning exercises.
Qualifications
Senior/lead level experience in Site Reliability Engineering or related field.
Strong understanding of AWS architecture, serverless design, and reliability patterns.
Experience with observability tooling and bringing SLIs/SLOs to life.
Job Function
Engineering and Information Technology
Industries
Software Development
Note: This posting contains standard job details and does not include external tracking links or site-only notices.
#J-18808-Ljbffr