HTP Solutions
We are seeking a highly skilled
Site Reliability Engineer (SRE)
with expertise in
Microsoft Azure
and
Infrastructure as Code (IaC)
using
Terraform . As an SRE, you will be responsible for maintaining high availability, scalability, and performance of critical systems while driving infrastructure automation and reliability best practices.
Key Responsibilities: •Design, implement, and maintain scalable, resilient, and secure cloud infrastructure on
Azure . •Automate provisioning, configuration, and monitoring of infrastructure using
Terraform . •Develop and enforce
SRE principles
like
SLAs, SLOs, SLIs , and error budgets. •Build observability and monitoring systems using tools like
Azure Monitor, Log Analytics, Prometheus, Grafana , etc. •Respond to production incidents, conduct root cause analysis, and drive incident resolution. •Implement CI/CD pipelines and integrate automated testing and deployment. •Collaborate with software engineering, DevOps, and IT operations teams to enhance system performance and reliability. •Optimize cost, performance, and security of cloud infrastructure. •Maintain runbooks, documentation, and conduct regular disaster recovery and failover exercises.
Required Qualifications: • 3-7+ years
of experience as a
Site Reliability Engineer, DevOps Engineer, or Cloud Engineer . •Strong hands-on experience with
Microsoft Azure
services (VMs, AKS, App Services, Azure SQL, Storage, etc.). •Proficiency with
Terraform
for infrastructure provisioning and configuration management. •Experience with
CI/CD tools
such as Azure DevOps, GitHub Actions, Jenkins, or similar. •Strong understanding of
Linux/Windows
systems administration. •Familiarity with
containerization
and
Kubernetes
(especially AKS). •Experience with
monitoring and logging tools
(e.g., Azure Monitor, App Insights, ELK, Prometheus, Grafana). •Proficient in scripting languages (e.g.,
PowerShell, Bash, Python ). •Strong troubleshooting and incident management skills.
Preferred Qualifications: •Certifications:
Microsoft Certified: Azure Administrator / DevOps Engineer Associate . •Experience with
GitOps ,
Helm , or
Service Mesh
(Istio, Linkerd). •Knowledge of
security best practices
and compliance in cloud environments. •Exposure to
Agile/DevOps
culture and practices.
Soft Skills: •Strong communication and collaboration skills. •Ability to thrive in a fast-paced, dynamic environment. •Problem-solving mindset with attention to detail. •Team player with a proactive attitude.
Site Reliability Engineer (SRE)
with expertise in
Microsoft Azure
and
Infrastructure as Code (IaC)
using
Terraform . As an SRE, you will be responsible for maintaining high availability, scalability, and performance of critical systems while driving infrastructure automation and reliability best practices.
Key Responsibilities: •Design, implement, and maintain scalable, resilient, and secure cloud infrastructure on
Azure . •Automate provisioning, configuration, and monitoring of infrastructure using
Terraform . •Develop and enforce
SRE principles
like
SLAs, SLOs, SLIs , and error budgets. •Build observability and monitoring systems using tools like
Azure Monitor, Log Analytics, Prometheus, Grafana , etc. •Respond to production incidents, conduct root cause analysis, and drive incident resolution. •Implement CI/CD pipelines and integrate automated testing and deployment. •Collaborate with software engineering, DevOps, and IT operations teams to enhance system performance and reliability. •Optimize cost, performance, and security of cloud infrastructure. •Maintain runbooks, documentation, and conduct regular disaster recovery and failover exercises.
Required Qualifications: • 3-7+ years
of experience as a
Site Reliability Engineer, DevOps Engineer, or Cloud Engineer . •Strong hands-on experience with
Microsoft Azure
services (VMs, AKS, App Services, Azure SQL, Storage, etc.). •Proficiency with
Terraform
for infrastructure provisioning and configuration management. •Experience with
CI/CD tools
such as Azure DevOps, GitHub Actions, Jenkins, or similar. •Strong understanding of
Linux/Windows
systems administration. •Familiarity with
containerization
and
Kubernetes
(especially AKS). •Experience with
monitoring and logging tools
(e.g., Azure Monitor, App Insights, ELK, Prometheus, Grafana). •Proficient in scripting languages (e.g.,
PowerShell, Bash, Python ). •Strong troubleshooting and incident management skills.
Preferred Qualifications: •Certifications:
Microsoft Certified: Azure Administrator / DevOps Engineer Associate . •Experience with
GitOps ,
Helm , or
Service Mesh
(Istio, Linkerd). •Knowledge of
security best practices
and compliance in cloud environments. •Exposure to
Agile/DevOps
culture and practices.
Soft Skills: •Strong communication and collaboration skills. •Ability to thrive in a fast-paced, dynamic environment. •Problem-solving mindset with attention to detail. •Team player with a proactive attitude.