Splunk Inc.
Manager, Site Reliability Engineering (Chicago Area) - 33539
Splunk Inc., Bloomington, Illinois, us, 61791
Role
Join us as we pursue our ground‑breaking vision to make machine data accessible, usable, and valuable to everyone. At Splunk, we are committed to delivering the best experience for our customers and fostering a culture of collaborative success. You will lead and manage one of the largest and most sophisticated cloud‑scale, big‑data, and microservices platforms in the world. You will be responsible for managing engineers who operate highly available, scalable, and cost‑efficient applications, improving the reliability and resiliency of services and infrastructure. Responsibilities
Lead a team of engineers passionate about large‑scale distributed systems for Splunk Cloud Observability in FedRAMP environments. Manage across the organization to deliver quality products that delight Splunk’s users. Mentor and grow teams building a state‑of‑the‑art, cloud‑based environment for massive‑scale data processing. Partner with Talent Acquisition to recruit, interview, and hire top engineering talent for Splunk’s growing SRE FedRAMP team. Manage engineers to achieve more than expected, driving team success. Lead reliability projects including HA, business continuity planning, disaster recovery, backup/restore, RTO, RPO, chaos engineering, application uptime and performance, capacity management & planning, SLI/SLO monitoring, error budgets, production runbooks, tooling, automation, incident management, and cloud cost optimization. Qualifications
Must‑Have: 8+ years of experience in large‑scale cloud‑native microservices platforms. 2+ years of strong hands‑on management experience deploying, handling, and monitoring large‑scale Kubernetes clusters in the public cloud (AWS or GCP). Experience leading a team in infrastructure automation and scripting using Python and/or Golang. Experience managing remote teams. Strong hands‑on experience with monitoring tools such as Splunk, Prometheus, Grafana, ELK stack, etc. for large‑scale microservices deployments. Experience with deployment, operations, and performance management of clusters such as Cassandra, Kafka, Elastic Search, MongoDB, ZooKeeper, Redis, etc. Excellent problem‑solving, triaging, and debugging skills in large‑scale distributed systems. Preferred: Familiarity with compliance environments such as HIPAA, GovCloud, State Government, Federal Government, SOC2, or FedRAMP. AWS Solutions Architect certification. Confluent Certified Administrator for Apache Kafka and/or Apache Cassandra Administrator Associate certifications. Experience with Infrastructure‑as‑Code tools such as Terraform, CloudFormation, Google Deployment Manager, Pulumi, Packer, ARM, etc. Experience with CI/CD frameworks and pipeline‑as‑code such as Jenkins, Spinnaker, Gitlab, Argo, Artifactory, etc. Proven skills in influencing design, operations, and deployment of highly available software across teams and functions. Bachelor’s or Master’s in Computer Science, Computer Engineering, or related technical field, or equivalent practical experience. Annual Base Pay: $195,000.00 - 218,000.00 USD
Splunk, a Cisco company, is an Equal Opportunity Employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, gender, sexual orientation, national origin, genetic information, age, disability, veteran status, or any other legally protected basis.
#J-18808-Ljbffr
Join us as we pursue our ground‑breaking vision to make machine data accessible, usable, and valuable to everyone. At Splunk, we are committed to delivering the best experience for our customers and fostering a culture of collaborative success. You will lead and manage one of the largest and most sophisticated cloud‑scale, big‑data, and microservices platforms in the world. You will be responsible for managing engineers who operate highly available, scalable, and cost‑efficient applications, improving the reliability and resiliency of services and infrastructure. Responsibilities
Lead a team of engineers passionate about large‑scale distributed systems for Splunk Cloud Observability in FedRAMP environments. Manage across the organization to deliver quality products that delight Splunk’s users. Mentor and grow teams building a state‑of‑the‑art, cloud‑based environment for massive‑scale data processing. Partner with Talent Acquisition to recruit, interview, and hire top engineering talent for Splunk’s growing SRE FedRAMP team. Manage engineers to achieve more than expected, driving team success. Lead reliability projects including HA, business continuity planning, disaster recovery, backup/restore, RTO, RPO, chaos engineering, application uptime and performance, capacity management & planning, SLI/SLO monitoring, error budgets, production runbooks, tooling, automation, incident management, and cloud cost optimization. Qualifications
Must‑Have: 8+ years of experience in large‑scale cloud‑native microservices platforms. 2+ years of strong hands‑on management experience deploying, handling, and monitoring large‑scale Kubernetes clusters in the public cloud (AWS or GCP). Experience leading a team in infrastructure automation and scripting using Python and/or Golang. Experience managing remote teams. Strong hands‑on experience with monitoring tools such as Splunk, Prometheus, Grafana, ELK stack, etc. for large‑scale microservices deployments. Experience with deployment, operations, and performance management of clusters such as Cassandra, Kafka, Elastic Search, MongoDB, ZooKeeper, Redis, etc. Excellent problem‑solving, triaging, and debugging skills in large‑scale distributed systems. Preferred: Familiarity with compliance environments such as HIPAA, GovCloud, State Government, Federal Government, SOC2, or FedRAMP. AWS Solutions Architect certification. Confluent Certified Administrator for Apache Kafka and/or Apache Cassandra Administrator Associate certifications. Experience with Infrastructure‑as‑Code tools such as Terraform, CloudFormation, Google Deployment Manager, Pulumi, Packer, ARM, etc. Experience with CI/CD frameworks and pipeline‑as‑code such as Jenkins, Spinnaker, Gitlab, Argo, Artifactory, etc. Proven skills in influencing design, operations, and deployment of highly available software across teams and functions. Bachelor’s or Master’s in Computer Science, Computer Engineering, or related technical field, or equivalent practical experience. Annual Base Pay: $195,000.00 - 218,000.00 USD
Splunk, a Cisco company, is an Equal Opportunity Employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, gender, sexual orientation, national origin, genetic information, age, disability, veteran status, or any other legally protected basis.
#J-18808-Ljbffr