Site Reliability Engineer (SRE)
HTP Solutions - Irving
Work at HTP Solutions
Overview
- View job
Overview
We are seeking a highly skilled Site Reliability Engineer (SRE) with expertise in Microsoft Azure and Infrastructure as Code (IaC) using Terraform . As an SRE, you will be responsible for maintaining high availability, scalability, and performance of critical systems while driving infrastructure automation and reliability best practices.
Key Responsibilities:
• Design, implement, and maintain scalable, resilient, and secure cloud infrastructure on Azure .
• Automate provisioning, configuration, and monitoring of infrastructure using Terraform .
• Develop and enforce SRE principles like SLAs, SLOs, SLIs , and error budgets.
• Build observability and monitoring systems using tools like Azure Monitor, Log Analytics, Prometheus, Grafana , etc.
• Respond to production incidents, conduct root cause analysis, and drive incident resolution.
• Implement CI/CD pipelines and integrate automated testing and deployment.
• Collaborate with software engineering, DevOps, and IT operations teams to enhance system performance and reliability.
• Optimize cost, performance, and security of cloud infrastructure.
• Maintain runbooks, documentation, and conduct regular disaster recovery and failover exercises.
Required Qualifications:
• 3-7+ years of experience as a Site Reliability Engineer, DevOps Engineer, or Cloud Engineer .
• Strong hands-on experience with Microsoft Azure services (VMs, AKS, App Services, Azure SQL, Storage, etc.).
• Proficiency with Terraform for infrastructure provisioning and configuration management.
• Experience with CI/CD tools such as Azure DevOps, GitHub Actions, Jenkins, or similar.
• Strong understanding of Linux/Windows systems administration.
• Familiarity with containerization and Kubernetes (especially AKS).
• Experience with monitoring and logging tools (e.g., Azure Monitor, App Insights, ELK, Prometheus, Grafana).
• Proficient in scripting languages (e.g., PowerShell, Bash, Python ).
• Strong troubleshooting and incident management skills.
Preferred Qualifications:
• Certifications: Microsoft Certified: Azure Administrator / DevOps Engineer Associate .
• Experience with GitOps , Helm , or Service Mesh (Istio, Linkerd).
• Knowledge of security best practices and compliance in cloud environments.
• Exposure to Agile/DevOps culture and practices.
Soft Skills:
• Strong communication and collaboration skills.
• Ability to thrive in a fast-paced, dynamic environment.
• Problem-solving mindset with attention to detail.
• Team player with a proactive attitude.