Optomi
Site Reliability Engineer - (Hybrid, Orlando FL)
Optomi, in partnership with a leading enterprise organization, is seeking a Site Reliability Engineer (SRE) to join a cloud-focused engineering team supporting large-scale, customer-facing systems. This role requires onsite presence two days per week in Orlando, FL. The ideal candidate is a strong cloud engineer with AWS expertise, hands-on Terraform experience, solid scripting skills, and the confidence to communicate clearly with stakeholders and executive leadership during high-pressure situations.
What the Right Candidate Will Enjoy!
Working in a modern cloud environment with primary focus on AWS and exposure to GCP and Azure!
Supporting enterprise-scale systems with real business impact!
Participating in incident bridge calls and collaborating directly with leadership!
Maintaining and improving existing Infrastructure as Code environments!
Joining a small, highly collaborative SRE/DevOps-focused team!
Having autonomy, trust, and visibility while contributing to critical initiatives!
Experience of the Right Candidate:
Strong hands-on experience supporting AWS cloud environments.
Experience working with GCP and/or Azure in an enterprise setting.
Hands-on experience maintaining and modifying existing Terraform infrastructure.
Comfortable scripting and troubleshooting code-related issues (Python, Bash, Node.js, or similar).
Experience using monitoring and observability tools such as Splunk, CloudWatch, Grafana, or AppDynamics.
Ability to clearly communicate technical issues to both technical and non-technical audiences.
Confidence speaking on calls with large groups, including stakeholders and leadership.
Experience working in on-call or incident-response environments.
Responsibilities of the Right Candidate:
Maintain, support, and optimize cloud infrastructure across AWS, GCP, and Azure environments.
Work with existing Terraform and Atlantis configurations to support infrastructure needs.
Troubleshoot infrastructure, application, and CI/CD-related issues.
Participate in incident bridge calls and provide clear status updates to leadership.
Support load balancers, containerized workloads, and cloud-native services.
Collaborate with application teams to identify whether issues are infrastructure- or code-related.
Utilize monitoring and alerting tools to ensure system performance and reliability.
Communicate effectively with engineers, stakeholders, and executives during incidents and projects.
Monitoring, Tooling & Cloud Exposure:
AWS services including EC2, ECS, EKS, Fargate, Lambda, API Gateway, S3, ALB/ELB, VPC, IAM, and KMS.
Google Cloud Platform services including App Engine, Kubernetes, Cloud Functions, and IAM.
Infrastructure as Code using Terraform (existing configurations).
Monitoring and observability tools including Splunk, CloudWatch, Grafana, and AppDynamics.
Configuration and automation tools such as Chef, Ansible, Rundeck, and Vault.
Message queuing technologies including RabbitMQ and Pub/Sub.
Preferred Qualifications:
Experience supporting load balancers and high-traffic systems.
Background in SRE or DevOps-oriented teams.
Experience working in hybrid cloud and on-prem environments.
Strong Linux or Windows systems administration background.
Enterprise experience supporting customer-facing applications.