Optomi

Site Reliability Engineer (Hybrid 1x a week, up to $87/hr)

Optomi, Orlando, Florida, us, 32885

Site Reliability Engineer (Hybrid 1x a week, up to $87/hr)

Optomi, in partnership with an industry-leading entertainment corporation, is seeking a Senior Site Reliability Engineer (SRE) to join their Data Platform team. In this mission-critical role, you will design, scale, and maintain the infrastructure powering data products and real-time insights across digital and physical experiences. This position operates at the intersection of DevOps, data engineering, and platform reliability, working closely with cross-functional teams to ensure the scalability, observability, and reliability of high-throughput data systems. You will leverage automation, infrastructure-as-code, and cloud-native technologies to reduce operational overhead, improve incident response, and drive innovation across petabyte-scale data pipelines. Responsibilities Design, scale, and maintain the infrastructure powering data products and real-time insights. Ensure scalability, observability, and reliability of high-throughput data systems. Collaborate with cross-functional teams to support DevOps, data engineering, and platform reliability. Leverage automation and infrastructure-as-code to reduce operational overhead and improve incident response. Drive innovation across petabyte-scale data pipelines. Lead incident response, perform root-cause analysis, and drive continuous improvement. Design and maintain SLAs, SLOs, and SLIs in production systems. Implement and refine monitoring and telemetry (tracing, metrics, logging). Contribute to CI/CD automation and cloud-native toolchains.

Qualifications

6+ years of professional software engineering experience, focusing on reliability, infrastructure, or platform engineering. Strong programming skills in Python and at least one statically typed language (e.g., Java, TypeScript, Go). Deep hands-on experience with AWS services (Lambda, ECS/EKS, S3, IAM, API Gateway, SNS/SQS, Kinesis). Proven experience operating and scaling distributed systems in production environments. Expertise in observability and telemetry design: tracing, metrics, logging. Proficiency in CI/CD automation, infrastructure-as-code (Terraform, AWS CDK), and DevOps best practices. Solid understanding of SQL/NoSQL data stores and architectural trade-offs. Familiarity with agile development workflows, code reviews, and collaborative SDLC processes. Experience leading incident response, root cause analysis, and driving continuous improvement. Ability to design and maintain SLAs, SLOs, and SLIs in production systems. Strong communication and cross-functional collaboration skills.

Nice to have

Experience supporting real-time analytics infrastructure, data pipelines, or streaming platforms. Familiarity with monitoring tools such as DataDog, especially for serverless applications. Expertise in performance profiling, distributed tracing, and root cause analysis in complex systems. Track record of improving reliability metrics (MTTR, deployment frequency, etc.). Understanding of compliance, governance, and security best practices in cloud-based data environments. Background in media, entertainment, or high-availability consumer platforms.

Details

Seniority level: Mid-Senior level Employment type: Full-time Job function: IT Services and IT Consulting; Entertainment Providers Location: Lake Buena Vista, FL Compensation: $75.00-$85.00 per hour Posted: 1 week ago

#J-18808-Ljbffr