1976 Walt Disney Attractions Technology LLC

Lead Site Reliability Engineer

1976 Walt Disney Attractions Technology LLC, Orlando, Florida, us, 32885

Lead Site Reliability Engineer page is loaded Lead Site Reliability Engineer

Apply remote type Primarily On-Site / Occasionally from Home locations Orlando, FL, USA time type Full time posted on Posted 2 Days Ago job requisition id 10129163

Job Posting Title: Lead Site Reliability Engineer

Req ID: 10129163

Job Description: We Power the Magic! Thats our motto at Disney Experiences (DX). Our team creates world-class immersive digital experiences for the Companys premier vacation brands including Disneys Parks & Resorts worldwide, Disney Cruise Line, Aulani, a Disney Resort & Spa, and Disney Vacation Club. We are responsible for the end-to-end digital and physical Guest experience for all technology & digital-led initiatives across the Attractions & Entertainment, Food & Beverage, Resorts & Transportation and Merchandise lines of business as well as other initiatives including MyDisneyExperience and Hey, Disney! This role sits in the Commerce Shared Services organization within Technology & Digital for Disney Experiences. It works closely with Technical Operations and Product Delivery from across the company. The Lead Site Reliability Engineer will report to the Manager-Site Reliability Engineer. About The Role & Team: This is a team lead role that focuses on engineering and reliability with a team of site reliability engineers.You will be responsible for coordinating the teams efforts for the portfolio of applications supported by the team. This team needs a strong mentor who can help develop and execute specific reliability plans in line with the business strategy of DX Tech and Digital. What You'll Do: Lead the evolution of DevOps practices within the broader team framework, guiding others in leveraging this culture to enhance observability practices.

Consult, design, build, and support development pipelines, automate infrastructure and operations, create telemetry for monitoring, engineer high reliability and reinforce best- practices to secure company data.

Expertise in systems administration skills on AWS Cloud, Docker, Kubernetes and must have extensive experience with web technologies, source control management using Nimbus, ECS, Tomcat, Harness, GitHub and GitLab.

Develop and advocate strategic directions for reliability, observability and recovery and bring practical knowledge on systems, network, operational excellence and application stability, security, performance, and capacity management.

Plan and coordinate larger efforts for the team of site reliability engineers.

You will be expected to stay up to date with emerging technologies so you can make informed recommendations.

Drive teams to consult, design, build, and support development pipelines, automate infrastructure and operations, build telemetry for monitoring, engineer high-reliability and reinforce best-practices to secure company data

Required Qualifications: Minimum 7 years of related work experience

Demonstrated leadership in implementing observability principles across complex systems and environments, fostering a culture of reliability and resilience

Extensive experience with modern software delivery tools, including GitHub, GitLab, Harness.io, LaunchDarkly, Nimbus, Kubernetes and with optimizing workflows and ensuring seamless deployment processes

Outstanding communication and leadership abilities, to ensure effective growth and development of team

A visionary who motivates teams to excel and fosters creativity, consistently driving excellence in all endeavors

An advocate for a diverse and inclusive culture that encourages innovation and ensures every team member feels a sense of belonging

Proficient in implementing observability principles and advanced tools for system enhancement, applying expertise in major APM tools

Fluent in core scripting languages and advanced programming skills (Python, NodeJS, Golang), experienced with Linux, CLI's, and code editors like VS Code

Skilled in Source Control Management systems like GitHub and Gitlab, managing users, and repos, proficient in networking protocols, distributed systems, and container platforms (e.g., Docker, ECS)

Experience in cloud hosting services (AWS, Google Cloud, Azure), databases, tools, and security, with experience in CI pipelines, build tools like Jenkins, RESTful web service calls, and JSON

Outstanding troubleshooting methodology, including instructing new methodologies to the team and evaluating new systems and infrastructure solutions for technical feasibility against standards

Preferred Qualifications: Leveraging AI for predictive insights, driving measurable continuous improvement in system reliability

Required Education: Bachelors degree in Computer Science, Information Systems, Software, Electrical or Electronics Engineering, or comparable field of study, and/or equivalent work experience

#DISNEYTECH Job Posting Segment: Technology & Digital

Job Posting Primary Business: Commerce

Primary Job Posting Category: Site/System Reliability Engineer

Employment Type: Full time

Primary City, State, Region, Postal Code: Orlando, FL, USA

Alternate City, State, Region, Postal Code: Date Posted: 2025-08-21 #J-18808-Ljbffr