ZipRecruiter
Sr Principal Site Reliability Engineer (SASE)
ZipRecruiter, Santa Clara, California, us, 95053
Your Career
We are looking for a proactive and innovative Site Reliability Engineer (SRE) to join our growing team. In this role, you will be at the forefront of ensuring our services are reliable, scalable, and efficient. You will have the unique opportunity to leverage cutting-edge AI tools to redefine our operational practices and build powerful self-service tools to empower our engineering teams. If you are passionate about building resilient systems and automating everything, we want to hear from you! Your Impact
Develop Intelligent Automation: Design, build, and maintain automation solutions to handle everything from provisioning and deployment to failure detection and remediation. Empower Engineers with Self-Service Tools: Build and maintain user-friendly self-service tooling—including internal web portals, Slack bots, and automated JIRA workflows—to streamline developer and operational tasks. Define and Uphold Reliability Standards: Establish and manage Service Level Indicators (SLIs) and Service Level Objectives (SLOs) to meet and exceed our Service Level Agreements (SLAs). Lead Incident Response: Act as a key leader during production incidents, driving resolution, and conducting blameless postmortems to prevent future occurrences. Leverage AI for SRE: Utilize AI-powered tools for advanced observability, anomaly detection, predictive alerting, and automating complex operational tasks to enhance system reliability. Build for Scale: Collaborate with engineering teams to design and implement scalable, highly available, and secure infrastructure. Qualifications
Proven experience as a Site Reliability Engineer, DevOps Engineer, or in a similar software engineering role. Strong proficiency in a programming language such as Python or Go. Experience building self-service tools (e.g., internal web portals, Slack integrations) to improve developer productivity and reduce operational toil. Deep understanding of the principles of SLIs, SLOs, and SLAs and experience implementing them. Hands-on experience with incident management protocols and participating in on-call rotations. Familiarity with using AI/ML tools in an operational context for tasks like log analysis, anomaly detection, or automated remediation. Proficiency with cloud platforms (GCP, AWS, or Azure) and container orchestration tools (Kubernetes, Docker). A strong problem-solving mindset and a passion for continuous improvement and learning. Additional Information
The Team
As a member of the SRE team, you will work on producing mission-critical platforms, tools, and processes that will ensure the highest levels of availability and reliability of all our applications. We need creative and innovative problem solvers who can partner with our Application development teams to make their services more usable. Our SRE team is furnished with a standout opportunity to build tools, frameworks, and cloud platforms that will support our company’s growth over the next decade. If you are a self-starter and jump on new ideas to make the platform more stable, secure and feature-rich, this is your new career. Compensation Disclosure
The compensation offered for this position will depend on qualifications, experience, and work location. For candidates who receive an offer at the posted level, the starting base salary (for non-sales roles) or base salary + commission target (for sales/commissioned roles) is expected to be between $220,000 - $270,000/YR. The offered compensation may also include restricted stock units and a bonus. A description of our employee benefits may be found here. LI-TD1 Our Commitment
We’re problem solvers that take risks and challenge cybersecurity’s status quo. It’s simple: we can’t accomplish our mission without diverse teams innovating, together. We are committed to providing reasonable accommodations for all qualified individuals with a disability. If you require assistance or accommodation due to a disability or special need, please contact us at accommodations at paloaltonetworks dot com. Palo Alto Networks is an equal opportunity employer. We celebrate diversity in our workplace, and all qualified applicants will receive consideration for employment without regard to ancestry, color, family or medical care leave, or expression, genetic information, marital status, medical condition, physical or mental disability, political affiliation, protected veteran status, or other legally protected characteristics. All your information will be kept confidential according to EEO guidelines.
#J-18808-Ljbffr
We are looking for a proactive and innovative Site Reliability Engineer (SRE) to join our growing team. In this role, you will be at the forefront of ensuring our services are reliable, scalable, and efficient. You will have the unique opportunity to leverage cutting-edge AI tools to redefine our operational practices and build powerful self-service tools to empower our engineering teams. If you are passionate about building resilient systems and automating everything, we want to hear from you! Your Impact
Develop Intelligent Automation: Design, build, and maintain automation solutions to handle everything from provisioning and deployment to failure detection and remediation. Empower Engineers with Self-Service Tools: Build and maintain user-friendly self-service tooling—including internal web portals, Slack bots, and automated JIRA workflows—to streamline developer and operational tasks. Define and Uphold Reliability Standards: Establish and manage Service Level Indicators (SLIs) and Service Level Objectives (SLOs) to meet and exceed our Service Level Agreements (SLAs). Lead Incident Response: Act as a key leader during production incidents, driving resolution, and conducting blameless postmortems to prevent future occurrences. Leverage AI for SRE: Utilize AI-powered tools for advanced observability, anomaly detection, predictive alerting, and automating complex operational tasks to enhance system reliability. Build for Scale: Collaborate with engineering teams to design and implement scalable, highly available, and secure infrastructure. Qualifications
Proven experience as a Site Reliability Engineer, DevOps Engineer, or in a similar software engineering role. Strong proficiency in a programming language such as Python or Go. Experience building self-service tools (e.g., internal web portals, Slack integrations) to improve developer productivity and reduce operational toil. Deep understanding of the principles of SLIs, SLOs, and SLAs and experience implementing them. Hands-on experience with incident management protocols and participating in on-call rotations. Familiarity with using AI/ML tools in an operational context for tasks like log analysis, anomaly detection, or automated remediation. Proficiency with cloud platforms (GCP, AWS, or Azure) and container orchestration tools (Kubernetes, Docker). A strong problem-solving mindset and a passion for continuous improvement and learning. Additional Information
The Team
As a member of the SRE team, you will work on producing mission-critical platforms, tools, and processes that will ensure the highest levels of availability and reliability of all our applications. We need creative and innovative problem solvers who can partner with our Application development teams to make their services more usable. Our SRE team is furnished with a standout opportunity to build tools, frameworks, and cloud platforms that will support our company’s growth over the next decade. If you are a self-starter and jump on new ideas to make the platform more stable, secure and feature-rich, this is your new career. Compensation Disclosure
The compensation offered for this position will depend on qualifications, experience, and work location. For candidates who receive an offer at the posted level, the starting base salary (for non-sales roles) or base salary + commission target (for sales/commissioned roles) is expected to be between $220,000 - $270,000/YR. The offered compensation may also include restricted stock units and a bonus. A description of our employee benefits may be found here. LI-TD1 Our Commitment
We’re problem solvers that take risks and challenge cybersecurity’s status quo. It’s simple: we can’t accomplish our mission without diverse teams innovating, together. We are committed to providing reasonable accommodations for all qualified individuals with a disability. If you require assistance or accommodation due to a disability or special need, please contact us at accommodations at paloaltonetworks dot com. Palo Alto Networks is an equal opportunity employer. We celebrate diversity in our workplace, and all qualified applicants will receive consideration for employment without regard to ancestry, color, family or medical care leave, or expression, genetic information, marital status, medical condition, physical or mental disability, political affiliation, protected veteran status, or other legally protected characteristics. All your information will be kept confidential according to EEO guidelines.
#J-18808-Ljbffr