Optomi
Optomi, in partnership with our client, are seeking an experienced SRE II to join their team for a 6 month contract to hire opportunity that is 2 days hybrid onsite in Irving, TX.
W2 only - no C2C / sponsorship at this time.
We are seeking a highly skilled Site Reliability Engineer II to join our engineering organization. This role focuses on building resilient, scalable, and automated systems— NOT traditional production support. The ideal candidate has hands‑on engineering experience across cloud infrastructure, observability, automation, and reliability‑focused development.
You will work closely with development, cloud engineering, and platform teams to ensure high availability, optimal performance, and operational excellence of critical customer‑facing applications.
Key Responsibilities
Contribute directly to the reliability, scalability, performance, and security of critical applications.
Build reusable services, automation, and frameworks that improve platform stability and developer velocity.
Cloud & Platform Engineering Design and enhance cloud infrastructure using Azure services including :
Azure Service Bus
Event Hub
Azure SQL
AKS (Azure Kubernetes Service)
Function Apps
App Services
Implement and manage Infrastructure as Code (IaC) using Terraform.
Containerization & Orchestration
Build and deploy containerized applications using Docker (23+ years).
Support Kubernetes workloads via AKS, including scaling, upgrades, and cluster reliability improvements.
Development & DevOps
Collaborate with development teams using a working knowledge of .NET.
Improve CI / CD workflows using Azure DevOps (ADO).
Monitoring, Observability & Incident Response
Implement and optimize monitoring and alerting strategies.
Use Splunk Observability Cloud (preferred) or equivalent observability platforms to enhance visibility and reduce MTTR.
Drive proactive incident identification, root‑cause analysis, and long‑term fixes.
Performance, Reliability & Scalability Enhancements
Design and implement SLOs, SLIs, and error budgets.
Develop auto‑scaling policies, failover strategies, and disaster recovery procedures.
Optimize application and database performance to ensure reliability across high‑traffic, mission‑critical systems.
Required Qualifications
35+ years of hands‑on SRE experience.
Bachelors degree in Computer Science, Engineering, or a related technical field (or equivalent experience).
Masters degree preferred.
Hands‑on experience with :
Azure Cloud (AKS, Service Bus, Event Hub, SQL, Function Apps, App Services)
Terraform
Docker
Azure DevOps
Monitoring tools (Splunk Observability Cloud preferred)
.NET ecosystem (understanding of development fundamentals)
Preferred Skills
Experience designing resilient, distributed systems.
Strong troubleshooting and analytical skills.
Performance tuning across applications, databases, and cloud services.
Experience improving uptime, latency, throughput, or cost efficiency of production applications.
Familiarity with SRE principles and modern operational practices.
#J-18808-Ljbffr
W2 only - no C2C / sponsorship at this time.
We are seeking a highly skilled Site Reliability Engineer II to join our engineering organization. This role focuses on building resilient, scalable, and automated systems— NOT traditional production support. The ideal candidate has hands‑on engineering experience across cloud infrastructure, observability, automation, and reliability‑focused development.
You will work closely with development, cloud engineering, and platform teams to ensure high availability, optimal performance, and operational excellence of critical customer‑facing applications.
Key Responsibilities
Contribute directly to the reliability, scalability, performance, and security of critical applications.
Build reusable services, automation, and frameworks that improve platform stability and developer velocity.
Cloud & Platform Engineering Design and enhance cloud infrastructure using Azure services including :
Azure Service Bus
Event Hub
Azure SQL
AKS (Azure Kubernetes Service)
Function Apps
App Services
Implement and manage Infrastructure as Code (IaC) using Terraform.
Containerization & Orchestration
Build and deploy containerized applications using Docker (23+ years).
Support Kubernetes workloads via AKS, including scaling, upgrades, and cluster reliability improvements.
Development & DevOps
Collaborate with development teams using a working knowledge of .NET.
Improve CI / CD workflows using Azure DevOps (ADO).
Monitoring, Observability & Incident Response
Implement and optimize monitoring and alerting strategies.
Use Splunk Observability Cloud (preferred) or equivalent observability platforms to enhance visibility and reduce MTTR.
Drive proactive incident identification, root‑cause analysis, and long‑term fixes.
Performance, Reliability & Scalability Enhancements
Design and implement SLOs, SLIs, and error budgets.
Develop auto‑scaling policies, failover strategies, and disaster recovery procedures.
Optimize application and database performance to ensure reliability across high‑traffic, mission‑critical systems.
Required Qualifications
35+ years of hands‑on SRE experience.
Bachelors degree in Computer Science, Engineering, or a related technical field (or equivalent experience).
Masters degree preferred.
Hands‑on experience with :
Azure Cloud (AKS, Service Bus, Event Hub, SQL, Function Apps, App Services)
Terraform
Docker
Azure DevOps
Monitoring tools (Splunk Observability Cloud preferred)
.NET ecosystem (understanding of development fundamentals)
Preferred Skills
Experience designing resilient, distributed systems.
Strong troubleshooting and analytical skills.
Performance tuning across applications, databases, and cloud services.
Experience improving uptime, latency, throughput, or cost efficiency of production applications.
Familiarity with SRE principles and modern operational practices.
#J-18808-Ljbffr