SRE - Automation Engineer
cyberThink - Austin
Work at cyberThink
Overview
- View job
Overview
As an SRE Automation Engineer, you will design and implement scalable, resilient, and intelligent automation solutions to enhance operational efficiency. This role requires a strong systems engineering background and an automation-first mindset to drive efficiency, reduce manual toil, and optimize large-scale cloud environments. You will automate infrastructure, integrate tools via APIs, enhance observability, and implement AIOps-driven solutions. This position is ideal for individuals passionate about problem-solving, AI/ML in operations, and driving innovation in automation.
Key Responsibilities:
- Develop Python-based automation solutions for on-prem (Pivotal Cloud Foundry, Windows & Linux VMs) and cloud infrastructure on GCP and Kubernetes.
- Continuously identify and implement improvements to enhance operational excellence.
- Build scalable and proactive automation solutions.
- Implement and manage configuration automation using Ansible (preferred).
- Integrate various tools and services via APIs and client libraries for seamless interoperability.
- Enhance deployment reliability through automated chaos strategies, failover mechanisms, and self-healing infrastructure.
- Develop proactive monitoring and alerting solutions using Splunk, GCP Operations Suite, Grafana, and Prometheus.
- Conduct deep root cause analysis (RCA) and incident management for system failures, developing automation to prevent recurrence.
- Optimize system resilience and performance tuning for mission-critical applications.
- Apply AI/ML techniques to automation workflows, enhancing anomaly detection, predictive scaling, and intelligent alerting.
Required Skills, Experiences, Education, and Competencies:
- Strong background in systems engineering with a focus on automation and reliability.
- Proficiency in Python (intermediate to expert level) for developing automation and integrations.
- Hands-on expertise with Kubernetes and cloud platforms (GCP or any major cloud).
- Experience integrating tools and platforms via APIs and client libraries.
- Deep understanding of monitoring and alerting using Splunk, GCP Operations Suite, Grafana, and Prometheus.
- Ability to operate in high-stakes environments where reliability and uptime are critical.
- Strong problem-solving skills to navigate uncertainty and complex challenges.
- Experience with Ansible for infrastructure automation.
- Prior experience working in mission-critical teams managing large-scale, high-availability systems.
- Enthusiasm for AI/ML and AIOps, with a passion for applying them in automation and operations.
The hourly range for roles of this nature are $40.00 to $80.00/hr. Rates are heavily dependent on skills, experience, location, and industry.
cyberThink is an Equal Opportunity Employer.