Topgolf
Responsible for ensuring the reliability, performance, and security of our internal and external facing platforms. This role combines deep expertise in Kubernetes, cloud infrastructure, observability, and Infrastructure-as-Code with strong problem-solving and incident management skills to keep systems scalable, resilient, and compliant. Collaborating closely with engineering and infrastructure teams, the SRE will drive automation, performance optimization, and reliability-focused practices that deliver seamless experiences for every guest — ensuring the fun never stops at Topgolf.
Platform & Application Reliability
Ensure the Booking Platform runs smoothly, 24/7 — high uptime, fast response times, zero missed tee times.
Proactively monitor system health using observability tools and rapidly troubleshoot and resolve issues if they arise.
Implement and maintain service-level objectives (SLOs) and service-level indicators (SLIs) for key booking and venue services.
Lead incident response, root cause analysis, and post-event improvements to drive continuous improvement.
Automation & Infrastructure
Conduct performance tuning and capacity planning through load and stress testing.
Use Infrastructure-as-Code (Terraform, Helm, Ansible) to standardize and scale across cloud and venue environments.
Collaboration
Work hand-in-hand with the platform engineering, support, engineers, and infrastructure teams to keep reliability front and center in every release.
Contribute to reliability-focused design reviews, architecture discussions, and operational readiness assessments.
Share reliability insights and best practices across teams to improve guest-facing tech everywhere.
Security & Compliance
Keep booking and venue systems compliant with industry and internal standards. (PCI, SOC)
Qualifications
Must-Haves
3+ years of experience in SRE, DevOps, or Platform Engineering.
Strong background in Kubernetes and cloud architecture.
Skilled in observability tooling (Observe, New Relic, Prometheus, Grafana, ELK, Datadog, etc.).
Proficient in Infrastructure as Code (IaC) tools (Terraform, Helm, Ansible) and CI/CD workflows.
Strong scripting chops in Python and/or Bash.
Proven ability to manage incidents, analyze root causes, and improve reliability.
Nice-to-Haves
Experience with high-traffic, guest-facing systems (reservations, payments, events).
Familiarity with multi-cluster Kubernetes and hybrid deployments.
Knowledge of distributed databases, caching strategies, and performance optimization.
Exposure to microservices and event-driven architectures.
At Topgolf, we don’t just build systems — we create moments that matter. You’ll have the chance to work on high-impact, high-visibility platforms that shape the guest experience at every venue. Your work will keep the fun flowing and the game going for millions of guests worldwide.
EEO Statement: Topgolf is a global sports and entertainment community and is committed to equal opportunity and is firmly committed to preventing discrimination and harassment, including sexual misconduct, based on legally protected diversity characteristics (such as race, color, religion, national origin, sex, age, disability, sexual orientation, gender identity or expression, family status, citizenship, genetic information and veteran status) in its application and hiring processes and in its employment decisions. As an affirmative action employer, Topgolf also takes steps to prevent retaliation and to create a respectful, equitable and inclusive environment for our Guests, Associates, business partners, vendors, and the communities we serve.
#J-18808-Ljbffr
#J-18808-Ljbffr