Logo
HTP Solutions

Site Reliability Engineer (SRE)

HTP Solutions, Irving, Texas, United States, 75084

Save Job

We are seeking a highly skilled

Site Reliability Engineer (SRE)

with expertise in

Microsoft Azure

and

Infrastructure as Code (IaC)

using

Terraform . As an SRE, you will be responsible for maintaining high availability, scalability, and performance of critical systems while driving infrastructure automation and reliability best practices.

Key Responsibilities: •Design, implement, and maintain scalable, resilient, and secure cloud infrastructure on

Azure . •Automate provisioning, configuration, and monitoring of infrastructure using

Terraform . •Develop and enforce

SRE principles

like

SLAs, SLOs, SLIs , and error budgets. •Build observability and monitoring systems using tools like

Azure Monitor, Log Analytics, Prometheus, Grafana , etc. •Respond to production incidents, conduct root cause analysis, and drive incident resolution. •Implement CI/CD pipelines and integrate automated testing and deployment. •Collaborate with software engineering, DevOps, and IT operations teams to enhance system performance and reliability. •Optimize cost, performance, and security of cloud infrastructure. •Maintain runbooks, documentation, and conduct regular disaster recovery and failover exercises.

Required Qualifications: • 3-7+ years

of experience as a

Site Reliability Engineer, DevOps Engineer, or Cloud Engineer . •Strong hands-on experience with

Microsoft Azure

services (VMs, AKS, App Services, Azure SQL, Storage, etc.). •Proficiency with

Terraform

for infrastructure provisioning and configuration management. •Experience with

CI/CD tools

such as Azure DevOps, GitHub Actions, Jenkins, or similar. •Strong understanding of

Linux/Windows

systems administration. •Familiarity with

containerization

and

Kubernetes

(especially AKS). •Experience with

monitoring and logging tools

(e.g., Azure Monitor, App Insights, ELK, Prometheus, Grafana). •Proficient in scripting languages (e.g.,

PowerShell, Bash, Python ). •Strong troubleshooting and incident management skills.

Preferred Qualifications: •Certifications:

Microsoft Certified: Azure Administrator / DevOps Engineer Associate . •Experience with

GitOps ,

Helm , or

Service Mesh

(Istio, Linkerd). •Knowledge of

security best practices

and compliance in cloud environments. •Exposure to

Agile/DevOps

culture and practices.

Soft Skills: •Strong communication and collaboration skills. •Ability to thrive in a fast-paced, dynamic environment. •Problem-solving mindset with attention to detail. •Team player with a proactive attitude.