GoTo Foods
Join to apply for the
Director, Site Reliability Engineering
role at
GoTo Foods Join to apply for the
Director, Site Reliability Engineering
role at
GoTo Foods Get AI-powered advice on this job and more exclusive features. Job Summary
We are seeking a Director of Site Reliability Engineering (SRE) to lead and evolve our reliability, observability, and automation initiatives across cloud-native, multi-tenant systems. This role is critical in driving uptime, performance, and efficiency for our production and customer-facing environments while fostering a culture of continuous improvement and operational excellence. Job Summary
We are seeking a Director of Site Reliability Engineering (SRE) to lead and evolve our reliability, observability, and automation initiatives across cloud-native, multi-tenant systems. This role is critical in driving uptime, performance, and efficiency for our production and customer-facing environments while fostering a culture of continuous improvement and operational excellence.
Essential Functions
Evolve a high-performing SRE team into a strategic, forward-leaning engineering force focused on innovation, automation, and measurable business impact Define and drive an advanced SRE roadmap centered on self-healing systems, adaptive scaling, and platform resilience Advance existing SLAs, SLOs, and SLIs into predictive, business-aligned reliability models; formalize executive-level SLO reporting Lead efforts to evolve observability into a proactive, AI/ML-driven capability for anomaly detection, early warning, and service health forecasting Strengthening incident response by integrating intelligent automation, enhancing runbooks, and refining on-call strategies for faster mitigation Expand chaos engineering and resilience testing practices across critical systems; institutionalize capacity stress testing and failover validation Refine CI/CD pipelines to support safe, high-frequency deployments with zero-touch rollback and dynamic environment provisioning Institutionalize Infrastructure as Code (IaC) patterns to drive repeatable, auditable infrastructure operations at scale Optimize FinOps practices with actionable insights into cost vs. performance tradeoffs and service-level ROI Drive deeper integration between SRE, Security, and Compliance for faster detection, triage, and resolution of security incidents Balance system reliability and deployment velocity by analyzing error rates and stability indicators Conduct Blameless Postmortems (BPM) for priority 1 incidents Provide go-live leadership for high-stakes brand launches and system expansions on the NextGen platform Partner with architecture and product teams to embed observability, scalability, and cost awareness into solution design Modernize disaster recovery operations to meet aggressive RTO/RPO objectives with fully automated failover mechanisms Resolve technical debt, and avoid creating new technical debt Oversee vendor performance, contract renewals, and third-party compliance across tooling and infrastructure partnerships Ensure quarterly contractor audits, identity governance, and system access reviews are thorough and timely Cultivate a culture of continuous learning, experimentation, and innovation through coaching, advanced training, and stretch assignments Develop continuous improvement framework based on agile retrospectives, SLIs, and service reviews Elevate the team's visibility and influence across the organization by aligning technical outcomes with business value
Education
Bachelor’s Degree in Information Systems or related discipline; required
Work Experience
Minimum 10 years of experience in software development or information technology Minimum 5 years working with cloud-native solutions, preferably with Azure Minimum 5 years of experience in DevOps and/or Site Reliability Engineering Minimum 4 years of people management (hiring, mentoring, and managing engineering staff) Strong knowledge of Infrastructure as Code (IaC) Experience with pipeline based SDLC CI/CD automation Experience working on a scrum team
Skills
Ability to communicate complex, technical concepts to executive team, business leaders and franchisees. Ability to develop and maintain positive business relationships and foster an environment of mutual respect, understanding, trust, and support. Ability to coach employees in a positive manner. Ability to facilitate the resolution of different views. Ability to collect information from others without putting it in a defensive posture. Ability to adapt and adjust planned work through analyzing work demands, competing priorities, and tight deadlines; to understand the most effective and efficient means to accomplish tasks within the parameters of the organizational structure, processes, systems, and policies. Ability to exercise judgment and discretion in dealing with matters of significance and sensitive nature. Excellent organizational communication and leadership skills. Excellent analytical and problem-solving skills. Ability to develop, communicate and implement strategies and tactics. Strong business acumen and sense of urgency to achieve business results.
CertificationsTravel Requirement
None
Seniority level
Seniority level Director Employment type
Employment type Full-time Job function
Job function Engineering and Information Technology Industries Food and Beverage Services Referrals increase your chances of interviewing at GoTo Foods by 2x Sign in to set job alerts for “Site Engineer” roles.
Atlanta, GA $90,000.00-$105,000.00 3 weeks ago Atlanta, GA $68,400.00-$92,000.00 2 weeks ago Director of Residential Engineering - Civil Site Development
Engineer - Embassy Suites by Hilton Atlanta Buckhead
Atlanta, GA $65,000.00-$80,000.00 3 weeks ago We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
#J-18808-Ljbffr
Director, Site Reliability Engineering
role at
GoTo Foods Join to apply for the
Director, Site Reliability Engineering
role at
GoTo Foods Get AI-powered advice on this job and more exclusive features. Job Summary
We are seeking a Director of Site Reliability Engineering (SRE) to lead and evolve our reliability, observability, and automation initiatives across cloud-native, multi-tenant systems. This role is critical in driving uptime, performance, and efficiency for our production and customer-facing environments while fostering a culture of continuous improvement and operational excellence. Job Summary
We are seeking a Director of Site Reliability Engineering (SRE) to lead and evolve our reliability, observability, and automation initiatives across cloud-native, multi-tenant systems. This role is critical in driving uptime, performance, and efficiency for our production and customer-facing environments while fostering a culture of continuous improvement and operational excellence.
Essential Functions
Evolve a high-performing SRE team into a strategic, forward-leaning engineering force focused on innovation, automation, and measurable business impact Define and drive an advanced SRE roadmap centered on self-healing systems, adaptive scaling, and platform resilience Advance existing SLAs, SLOs, and SLIs into predictive, business-aligned reliability models; formalize executive-level SLO reporting Lead efforts to evolve observability into a proactive, AI/ML-driven capability for anomaly detection, early warning, and service health forecasting Strengthening incident response by integrating intelligent automation, enhancing runbooks, and refining on-call strategies for faster mitigation Expand chaos engineering and resilience testing practices across critical systems; institutionalize capacity stress testing and failover validation Refine CI/CD pipelines to support safe, high-frequency deployments with zero-touch rollback and dynamic environment provisioning Institutionalize Infrastructure as Code (IaC) patterns to drive repeatable, auditable infrastructure operations at scale Optimize FinOps practices with actionable insights into cost vs. performance tradeoffs and service-level ROI Drive deeper integration between SRE, Security, and Compliance for faster detection, triage, and resolution of security incidents Balance system reliability and deployment velocity by analyzing error rates and stability indicators Conduct Blameless Postmortems (BPM) for priority 1 incidents Provide go-live leadership for high-stakes brand launches and system expansions on the NextGen platform Partner with architecture and product teams to embed observability, scalability, and cost awareness into solution design Modernize disaster recovery operations to meet aggressive RTO/RPO objectives with fully automated failover mechanisms Resolve technical debt, and avoid creating new technical debt Oversee vendor performance, contract renewals, and third-party compliance across tooling and infrastructure partnerships Ensure quarterly contractor audits, identity governance, and system access reviews are thorough and timely Cultivate a culture of continuous learning, experimentation, and innovation through coaching, advanced training, and stretch assignments Develop continuous improvement framework based on agile retrospectives, SLIs, and service reviews Elevate the team's visibility and influence across the organization by aligning technical outcomes with business value
Education
Bachelor’s Degree in Information Systems or related discipline; required
Work Experience
Minimum 10 years of experience in software development or information technology Minimum 5 years working with cloud-native solutions, preferably with Azure Minimum 5 years of experience in DevOps and/or Site Reliability Engineering Minimum 4 years of people management (hiring, mentoring, and managing engineering staff) Strong knowledge of Infrastructure as Code (IaC) Experience with pipeline based SDLC CI/CD automation Experience working on a scrum team
Skills
Ability to communicate complex, technical concepts to executive team, business leaders and franchisees. Ability to develop and maintain positive business relationships and foster an environment of mutual respect, understanding, trust, and support. Ability to coach employees in a positive manner. Ability to facilitate the resolution of different views. Ability to collect information from others without putting it in a defensive posture. Ability to adapt and adjust planned work through analyzing work demands, competing priorities, and tight deadlines; to understand the most effective and efficient means to accomplish tasks within the parameters of the organizational structure, processes, systems, and policies. Ability to exercise judgment and discretion in dealing with matters of significance and sensitive nature. Excellent organizational communication and leadership skills. Excellent analytical and problem-solving skills. Ability to develop, communicate and implement strategies and tactics. Strong business acumen and sense of urgency to achieve business results.
CertificationsTravel Requirement
None
Seniority level
Seniority level Director Employment type
Employment type Full-time Job function
Job function Engineering and Information Technology Industries Food and Beverage Services Referrals increase your chances of interviewing at GoTo Foods by 2x Sign in to set job alerts for “Site Engineer” roles.
Atlanta, GA $90,000.00-$105,000.00 3 weeks ago Atlanta, GA $68,400.00-$92,000.00 2 weeks ago Director of Residential Engineering - Civil Site Development
Engineer - Embassy Suites by Hilton Atlanta Buckhead
Atlanta, GA $65,000.00-$80,000.00 3 weeks ago We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
#J-18808-Ljbffr