Logo
Goodwin Recruiting

Sr DevOps Engineer, Cloud Infrastructure

Goodwin Recruiting, Atlanta, Georgia, United States, 30383

Save Job

This role involves designing, managing, and optimizing the organizations cloud infrastructure across AWS and Azure. It is a hybrid position combining DevOps and Site Reliability Engineering duties to ensure the rapid, reliable, and scalable delivery of software and services. The Senior Platform Engineer will oversee EKS and AKS clusters, maintain CI/CD pipelines in Azure DevOps, and implement monitoring and observability using Datadog to enhance system reliability and operational performance. Sr DevOps Engineer, Cloud Infrastructure Responsibilities: - Design, implement, and maintain AWS and Azure cloud infrastructure for scalability, reliability, and cost efficiency. - Manage and optimize EKS and AKS Kubernetes clusters, including scaling, upgrades, and workload management. - Develop and improve CI/CD pipelines using Azure DevOps and/or GitHub to accelerate software deployment. - Set up and manage monitoring, logging, and alerting systems with Datadog, Elastic, and Grafana to proactively identify and resolve issues. - Drive automation and Infrastructure-as-Code (IaC) initiatives with Terraform. - Collaborate with development teams to enhance deployment reliability, system performance, and operational processes. - Participate in incident response, on-call rotations, and disaster recovery planning. - Promote security best practices across cloud environments, including IAM, networking, and compliance. Sr DevOps Engineer, Cloud Infrastructure Requirements: - 10+ years of experience in Site Reliability Engineering, DevOps, or Cloud Infrastructure roles. - Hands-on experience with AWS and Azure services. - Expertise in managing production EKS and AKS Kubernetes clusters. - Proven ability to build and maintain CI/CD pipelines using platforms like Azure DevOps, GitHub Actions, Semaphore CI, or Jenkins for reliable, automated deployments. - Proficiency with monitoring and observability tools, preferably Datadog. - Strong scripting and automation skills (Python, Bash, PowerShell). - Experience with Infrastructure-as-Code tools such as Terraform, Bicep, or CloudFormation. - Solid understanding of networking, security, and system architecture in cloud environments. - Excellent problem-solving, troubleshooting, and incident management skills. - Strong communication skills and ability to work effectively across engineering and IT teams.