Cisco
Manager, SRE FedRAMP-33539
Join to apply for the
Manager, SRE FedRAMP-33539
role at
Cisco
Splunk, a Cisco company, is building a safer and more resilient digital world with an end‑to‑end full‑stack platform made for a hybrid, multi‑cloud world. Leading enterprises use our unified security and observability platform to keep their digital systems secure and reliable. Come help organizations be their best, while you reach new heights with a team that has your back.
Meet the Team The Splunk Observability Cloud team provides full‑fidelity monitoring and fixing across infrastructure, applications, and user interfaces, in real‑time and at any scale, to help our customers keep their services reliable, innovate faster, and deliver great customer experiences. Infrastructure Software Engineers at Splunk are cloud‑native systems engineers who use infrastructure‑as‑code, microservices, automation, and efficient design to build, operate, and scale our products.
You will lead and manage one of the largest and most sophisticated cloud‑scale, Bigdata, and microservices platforms in the world. You will be responsible for managing engineers who operate highly available, scalable, and cost‑efficient applications with low operational burden by handling and improving the reliability and resiliency of services and infrastructure. You thrive driving initiatives on automation, infrastructure‑as‑code, reliability engineering, and getting rid of tedious, manual tasks.
Lead a team of super smart engineers who are passionate about large scale distributed systems for Splunk Cloud Observability in FedRAMP environments
Manage across the organization to deliver quality products that delight Splunk's passionate users. Mentor and grow teams of tight‑knit engineers who are building a state‑of‑the‑art, cloud‑based environment for massive‑scale data processing.
Partner with our Talent Acquisition team as we recruit, interview and hire the best engineering talent to join Splunk's growing SRE FedRAMP team!
Manage engineers to achieve more than they thought possible. You enjoy managing and driving teams to success and are fulfilled through the success of others.
Your Impact
HA, Business Continuity Planning, disaster recovery, backup/restore, RTO, RPO
Chaos engineering
Application uptime and performance
Capacity management & planning
SLIs, SLOs, error budgets, and monitoring dashboards
Responsible for deployment and operations of large‑scale distributed data stores and streaming services
Establishing design patterns for monitoring and benchmarking
Establishing and documenting production run books and guidelines for developers
Tooling, toil reduction, runbooks & automation to handle production environments
Incident management and improving MTTD/MTTR for services
Cloud cost optimization
Minimum Qualifications
8+ years of experience in handling large‑scale cloud‑native microservices platforms.
2+ years of strong hands‑on management experience managing teams deploying, handling, and monitoring large‑scale Kubernetes clusters in the public cloud specifically AWS or GCP.
Experience with and leading a team in infrastructure automation and scripting using Python and/or Golang.
Experience managing remote teams.
Strong hands‑on experience in monitoring tools such as Splunk, Prometheus, Grafana, ELK stack, etc. in order to build observability for large‑scale microservices deployments.
Experience with deployment, operations, and performance management of one or more of the following large‑scale clusters such as Cassandra, Kafka, Elastic Search, MongoDB, ZooKeeper, Redis, etc.
Excellent problem‑solving, triaging, and debugging skills in large‑scale distributed systems.
Preferred Qualifications
Familiarity working with and/or managing in compliance environments such as HIPPA, GovCloud, State Government, Federal Government, SOC2 or FedRAMP.
AWS Solutions Architect certification preferred.
Confluent Certified Administrator for Apache Kafka and/or Apache Cassandra Administrator Associate certifications are preferred.
Experience with Infrastructure‑as‑Code using Terraform, CloudFormation, Google Deployment Manager, Pulumi, Packer, ARM, etc.
Experience with CI/CD frameworks and Pipeline‑as‑Code such as Jenkins, Spinnaker, Gitlab, Argo, Artifactory, etc.
Proven skills to effectively work across teams and functions to influence the design, operations, and deployment of highly available software.
Bachelors/Masters in Computer Science, Computer Engineering, or related technical field, or equivalent practical experience.
Why Cisco? At Cisco, we’re revolutionizing how data and infrastructure connect and protect organizations in the AI era – and beyond. We’ve been innovating fearlessly for 40 years to create solutions that power how humans and technology work together across the physical and digital worlds. These solutions provide customers with unparalleled security, visibility, and insights across the entire digital footprint.
Fueled by the depth and breadth of our technology, we experiment and create meaningful solutions. Add to that our worldwide network of doers and experts, and you’ll see that the opportunities to grow and build are limitless. We work as a team, collaborating with empathy to make really big things happen on a global scale. Because our solutions are everywhere, our impact is everywhere.
We are Cisco, and our power starts with you.
#J-18808-Ljbffr
Manager, SRE FedRAMP-33539
role at
Cisco
Splunk, a Cisco company, is building a safer and more resilient digital world with an end‑to‑end full‑stack platform made for a hybrid, multi‑cloud world. Leading enterprises use our unified security and observability platform to keep their digital systems secure and reliable. Come help organizations be their best, while you reach new heights with a team that has your back.
Meet the Team The Splunk Observability Cloud team provides full‑fidelity monitoring and fixing across infrastructure, applications, and user interfaces, in real‑time and at any scale, to help our customers keep their services reliable, innovate faster, and deliver great customer experiences. Infrastructure Software Engineers at Splunk are cloud‑native systems engineers who use infrastructure‑as‑code, microservices, automation, and efficient design to build, operate, and scale our products.
You will lead and manage one of the largest and most sophisticated cloud‑scale, Bigdata, and microservices platforms in the world. You will be responsible for managing engineers who operate highly available, scalable, and cost‑efficient applications with low operational burden by handling and improving the reliability and resiliency of services and infrastructure. You thrive driving initiatives on automation, infrastructure‑as‑code, reliability engineering, and getting rid of tedious, manual tasks.
Lead a team of super smart engineers who are passionate about large scale distributed systems for Splunk Cloud Observability in FedRAMP environments
Manage across the organization to deliver quality products that delight Splunk's passionate users. Mentor and grow teams of tight‑knit engineers who are building a state‑of‑the‑art, cloud‑based environment for massive‑scale data processing.
Partner with our Talent Acquisition team as we recruit, interview and hire the best engineering talent to join Splunk's growing SRE FedRAMP team!
Manage engineers to achieve more than they thought possible. You enjoy managing and driving teams to success and are fulfilled through the success of others.
Your Impact
HA, Business Continuity Planning, disaster recovery, backup/restore, RTO, RPO
Chaos engineering
Application uptime and performance
Capacity management & planning
SLIs, SLOs, error budgets, and monitoring dashboards
Responsible for deployment and operations of large‑scale distributed data stores and streaming services
Establishing design patterns for monitoring and benchmarking
Establishing and documenting production run books and guidelines for developers
Tooling, toil reduction, runbooks & automation to handle production environments
Incident management and improving MTTD/MTTR for services
Cloud cost optimization
Minimum Qualifications
8+ years of experience in handling large‑scale cloud‑native microservices platforms.
2+ years of strong hands‑on management experience managing teams deploying, handling, and monitoring large‑scale Kubernetes clusters in the public cloud specifically AWS or GCP.
Experience with and leading a team in infrastructure automation and scripting using Python and/or Golang.
Experience managing remote teams.
Strong hands‑on experience in monitoring tools such as Splunk, Prometheus, Grafana, ELK stack, etc. in order to build observability for large‑scale microservices deployments.
Experience with deployment, operations, and performance management of one or more of the following large‑scale clusters such as Cassandra, Kafka, Elastic Search, MongoDB, ZooKeeper, Redis, etc.
Excellent problem‑solving, triaging, and debugging skills in large‑scale distributed systems.
Preferred Qualifications
Familiarity working with and/or managing in compliance environments such as HIPPA, GovCloud, State Government, Federal Government, SOC2 or FedRAMP.
AWS Solutions Architect certification preferred.
Confluent Certified Administrator for Apache Kafka and/or Apache Cassandra Administrator Associate certifications are preferred.
Experience with Infrastructure‑as‑Code using Terraform, CloudFormation, Google Deployment Manager, Pulumi, Packer, ARM, etc.
Experience with CI/CD frameworks and Pipeline‑as‑Code such as Jenkins, Spinnaker, Gitlab, Argo, Artifactory, etc.
Proven skills to effectively work across teams and functions to influence the design, operations, and deployment of highly available software.
Bachelors/Masters in Computer Science, Computer Engineering, or related technical field, or equivalent practical experience.
Why Cisco? At Cisco, we’re revolutionizing how data and infrastructure connect and protect organizations in the AI era – and beyond. We’ve been innovating fearlessly for 40 years to create solutions that power how humans and technology work together across the physical and digital worlds. These solutions provide customers with unparalleled security, visibility, and insights across the entire digital footprint.
Fueled by the depth and breadth of our technology, we experiment and create meaningful solutions. Add to that our worldwide network of doers and experts, and you’ll see that the opportunities to grow and build are limitless. We work as a team, collaborating with empathy to make really big things happen on a global scale. Because our solutions are everywhere, our impact is everywhere.
We are Cisco, and our power starts with you.
#J-18808-Ljbffr