Omni Inclusive
AWS Resilience Hub and AWS Well-Architected Framework ,CloudWatch, EventBridge, AWS Config, AWS Security Hub, AWS networking (VPCs, Route53, etc), GoLang
Terraform, Understanding regulatory compliance (e.g, SOC 2, ISO 27001), Datadog, Elastic, SRE or DevOps methodologies
Key Responsibilities Monitoring & Alerting • Continuously monitor AWS Resilience Hub to track resilience scores and compliance with resilience policies. • Configure and manage alerts for deviations from resilience baselines and operational thresholds. • Monitor for Step Function and Lambda Function failures during executions and follow playbooks to respond accordingly. • Monitor the successful logging of Resilience Hub to our enterprise logging solution. Maintenance & Improvement • Work with engineering teams to implement resilience recommendations and automate recovery process. Incident Response & Troubleshooting • Investigate alerts and performance degradation issues, ensuring quick remediation. • Collaborate with Public Cloud Security and Infrastructure teams to enhance processes. • Maintain documentation on recovery procedures, resilience scoring, and architectural improvements. Automation & Optimization • Maintain and update Infrastructure as Code templates using terraform to maintain configurations. • Optimize cost and performance while maintaining resilience best practices. Required Skills & Qualifications • Strong hands-on experience with AWS Resilience Hub and AWS Well-Architected Framework • Proficiency in AWS monitoring tools: CloudWatch, EventBridge, AWS Config, AWS Security Hub • Hands-on experience with Infrastructure as Code tools such as Terraform • Strong software development experience specifically in GoLang • Familiarity with AWS networking (VPCs, Route53, etc) • Understanding regulatory compliance (e.g, SOC 2, ISO 27001) as they relate to cloud resilience • Experience with observability platforms like Datadog and Elastic • Background in SRE or DevOps methodologies
Key Responsibilities Monitoring & Alerting • Continuously monitor AWS Resilience Hub to track resilience scores and compliance with resilience policies. • Configure and manage alerts for deviations from resilience baselines and operational thresholds. • Monitor for Step Function and Lambda Function failures during executions and follow playbooks to respond accordingly. • Monitor the successful logging of Resilience Hub to our enterprise logging solution. Maintenance & Improvement • Work with engineering teams to implement resilience recommendations and automate recovery process. Incident Response & Troubleshooting • Investigate alerts and performance degradation issues, ensuring quick remediation. • Collaborate with Public Cloud Security and Infrastructure teams to enhance processes. • Maintain documentation on recovery procedures, resilience scoring, and architectural improvements. Automation & Optimization • Maintain and update Infrastructure as Code templates using terraform to maintain configurations. • Optimize cost and performance while maintaining resilience best practices. Required Skills & Qualifications • Strong hands-on experience with AWS Resilience Hub and AWS Well-Architected Framework • Proficiency in AWS monitoring tools: CloudWatch, EventBridge, AWS Config, AWS Security Hub • Hands-on experience with Infrastructure as Code tools such as Terraform • Strong software development experience specifically in GoLang • Familiarity with AWS networking (VPCs, Route53, etc) • Understanding regulatory compliance (e.g, SOC 2, ISO 27001) as they relate to cloud resilience • Experience with observability platforms like Datadog and Elastic • Background in SRE or DevOps methodologies