Logo
o9 Solutions

SRE Manager

o9 Solutions, Dallas, Texas, United States, 75215

Save Job

Transforming the Future of Enterprise Planning

At o9, our mission is to be the Most Value-Creating Platform for enterprises by transforming decision-making through our AI-first approach. By integrating siloed planning capabilities and capturing millions-even billions-in value leakage, we help businesses plan smarter and faster.

This not only enhances operational efficiency but also reduces waste, leading to better outcomes for both businesses and the planet. Global leaders like Google, PepsiCo, Walmart, T-Mobile, AB InBev, and Starbucks trust o9 to optimize their supply chains.

Transforming the Future of Enterprise Planning

At o9, our mission is to be the Most Value-Creating Platform for enterprises by transforming decision-making through our AI-first approach. By integrating siloed planning capabilities and capturing millions-even billions-in value leakage, we help businesses plan smarter and faster.

This not only enhances operational efficiency but also reduces waste, leading to better outcomes for both businesses and the planet. Global leaders like Google, PepsiCo, Walmart, T-Mobile, AB InBev, and Starbucks trust o9 to optimize their supply chains.

Senior Site Reliability Engineering Manager

At o9, we invest in people. We seek talented, driven individuals to power our transformative approach. You'll thrive in a dynamic, supportive environment, growing while making a real impact.

As the Site Reliability Engineering Manager at o9 Solutions, you will lead a high-performing global team responsible for ensuring the availability, performance, and scalability of our SaaS-based supply chain and retail planning platform. This leadership role requires a strategic thinker with strong technical acumen and a passion for building resilient systems. You will partner closely with cross-functional teams to drive improvements in reliability, system architecture, incident management, and DevOps best practices.

What you'll do for us

Team Leadership & Development Hire, mentor, and manage a globally distributed team of Site Reliability Engineers. Foster a culture of continuous improvement, accountability, collaboration, and operational excellence. Set performance goals, conduct regular feedback sessions, and support career growth for team members. Reliability & Performance Management

Own system uptime and SLA compliance across o9's cloud-native production environment. Drive root cause analysis and implement post-incident learning processes to improve system resilience. Oversee the design and implementation of robust monitoring, alerting, and logging solutions. Operational Strategy & Automation

Lead initiatives to improve infrastructure automation, deployment pipelines, and CI/CD practices. Champion Infrastructure as Code (IaC) and GitOps best practices. Manage capacity planning, scalability efforts, and performance tuning across services. Cross-functional Collaboration

Work closely with Engineering, QA, Product, and Customer Support teams to embed reliability into every stage of the software lifecycle. Advocate for SRE principles in system design, ensuring high availability and fault tolerance. Collaborate with cloud service providers (AWS, Azure, GCP) to optimize performance and cost. Incident Response & Support

Oversee 24/7 on-call rotations and ensure timely response to production incidents. Implement and refine incident management processes and playbooks. Communicate effectively with stakeholders during and after major incidents. What you'll have

Education & Experience

Bachelor's degree in Computer Science, Engineering, or a related field required; Master's degree preferred. 8+ years of experience in DevOps, SRE, or infrastructure roles, with 2+ years leading or managing technical teams. Experience operating complex, cloud-native production systems at scale. Certifications

Relevant cloud certifications (AWS, Azure, or GCP) strongly preferred. Kubernetes Administration (CKA) certification is a plus. Technical Proficiency

Strong knowledge of cloud platforms (AWS, Azure, GCP) and container orchestration (Kubernetes). Expertise in observability tools (Prometheus, Grafana, Datadog, etc.) and incident management platforms. Experience with configuration management tools (Terraform, Ansible, Helm, etc.). Solid understanding of networking, security, Linux internals, and distributed systems. Soft Skills

Proven ability to lead technical teams through high-stakes, high-impact situations. Strong communication skills with the ability to translate complex topics into clear stakeholder updates.

Strategic mindset with a bias for action and problem-solving. This position at o9 Solutions has an annual salary range of $149,818-$205,999. Additionally, you may be eligible to participate in our medical, retirement, and other company-sponsored benefits.**The above information reflects the expected base salary range, although the lower and upper bounds may vary based on location, skills, experience, certifications, licenses, or other relevant factors.

More about us...

At o9, transparency and open communication are at the core of our culture. Collaboration thrives across all levels-hierarchy, distance, or function never limit innovation or teamwork. Beyond work, we encourage volunteering opportunities, social impact initiatives, and diverse cultural celebrations.

With a $3.7 billion valuation and a global presence across Dallas, Amsterdam, Barcelona, Madrid, London, Paris, Tokyo, Seoul, and Munich, o9 is among the fastest-growing technology companies in the world. Through our aim10x vision, we are committed to AI-powered management, driving 10x improvements in enterprise decision-making. Our Enterprise Knowledge Graph enables businesses to anticipate risks, adapt to market shifts, and gain real-time visibility. By automating millions of decisions and reducing manual interventions by up to 90%, we empower enterprises to drive profitable growth, reduce inefficiencies, and create lasting value.

o9 is an equal-opportunity employer that values diversity and inclusion. We welcome applicants from all backgrounds, ensuring a fair and unbiased hiring process. Join us as we continue our growth journey!