Optomi

Staff Site Reliability Engineer

Optomi, California, Missouri, United States, 65018

This range is provided by Optomi. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.

Base pay range $70.00/hr - $85.00/hr

Direct message the job poster from Optomi

Staff Site Reliability Engineer (SRE), Remote – Must work

PST hours

Optomi, in partnership with a leading consulting partner supporting large-scale modernization initiatives, is seeking a Staff Site Reliability Engineer (SRE) to drive performance, reliability, and operational excellence across critical applications. This role will serve as a key member of stream-aligned teams, ensuring the stability, scalability, and observability of enterprise systems through proactive monitoring, automation, and performance optimization. If you’re passionate about high-impact, hands‑on reliability engineering and thrive in complex, fast-moving environments, this is an ideal opportunity to make a measurable difference.

Key Responsibilities:

Conduct load and performance testing using tools like JMeter or Gatling to assess scalability, identify bottlenecks, and optimize system performance. Ensure applications can handle expected workloads while maintaining optimal performance.

Implement and maintain real-time monitoring solutions leveraging Datadog, Prometheus, Grafana, and New Relic. Track health, performance metrics, and SLAs to detect and resolve issues before user impact.

Investigate and resolve incidents, outages, and performance degradations using diagnostic tools and root‑cause analysis. Lead cross‑functional incident response efforts, coordinating communication and documenting post‑incident reviews.

Design and implement effective error handling, alerting, and logging strategies to capture and triage anomalies. Improve reliability through reduced error rates and faster recovery times.

Develop and maintain automation scripts and infrastructure‑as‑code frameworks using Go, Bash, Terraform, Ansible, Docker, and Helm. Streamline deployment, configuration, and operational workflows to reduce manual intervention.

Partner with security engineers to ensure alignment with Zero Trust policies, vulnerability remediation, and ISO standards. Utilize tools such as Synk, Datadog, and Jira for ongoing compliance tracking.

Collaborate with development teams to promote reliability‑focused coding practices and improve observability within applications, reducing operational risk and increasing development velocity.

What the right candidate would enjoy:

Contribute to mission‑critical systems supporting statewide initiatives!

Work with modern DevOps and observability stacks in a large‑scale production environment!

Flexible opportunity with collaborative, outcome‑driven teams!

Required Skills & Experience:

Hands‑on SRE, DevOps, or Systems Engineering experience within enterprise or government environments

Proficiency with performance testing tools (e.g., JMeter, Gatling)

Strong experience with monitoring and observability tools (Datadog, New Relic, Prometheus, Grafana)

Expertise in automation and scripting (Go, Bash) and infrastructure‑as‑code tools (Ansible, Terraform)

Proven troubleshooting and problem‑solving abilities for complex, distributed systems

Understanding of Agile and DevOps methodologies with a focus on continuous improvement

Excellent communication and collaboration skills to work across cross‑functional teams

Technologies & Tools:

Testing: JMeter, Gatling

Seniority level Mid‑Senior level

Employment type Full‑time

Job function IT Services and IT Consulting

#J-18808-Ljbffr