Cavendish Professionals
Overview
Site Reliability Engineering (SRE) Architect - Munich (Remote/Hybrid)
I am partnered with a global consulting organisation that is expanding its engineering leadership team in Germany. They are seeking an experienced SRE Architect to define and drive the long-term reliability, scalability, and performance strategy across complex, cloud-native systems.
This is a senior architectural role with wide influence across engineering, combining deep technical expertise with leadership. You will set standards, frameworks, and practices that enable teams to deliver world-class services at scale.
Key Responsibilities
Architect & Strategy
- Design highly scalable and fault-tolerant infrastructure on leading cloud platforms (AWS, GCP, or Azure).
Reliability Frameworks
- Define and govern SLOs, SLIs, and error budgets across engineering teams.
Observability
- Lead observability design for metrics, tracing, logging, and alerting.
Automation & IaC
- Champion Infrastructure as Code (Terraform, Ansible) for secure, repeatable provisioning.
Resilience & Recovery
- Develop disaster recovery strategies, resilience patterns, and chaos engineering practices.
Leadership & Mentoring
- Act as a thought leader, mentoring engineers and embedding reliability best practices across the organisation.
Incident Evolution
- Analyse major incidents, drive systemic improvements, and evolve incident management culture.
Key Requirements
10+ years
in software engineering, DevOps, or systems engineering, including
5+ years
in senior SRE/architecture roles.
Expertise in at least one major
cloud provider
(AWS, GCP, Azure).
Strong hands-on experience with
Kubernetes
and microservices at scale.
Proven skills in
Infrastructure as Code
(Terraform, Ansible, Chef, or Puppet).
Solid background in
observability platforms
(Prometheus, Grafana, OpenTelemetry, ELK, Datadog, etc.).
Proficiency in
Python or Go
for automation and tooling.
Deep knowledge of
distributed systems, networking, and high-availability design patterns .
Nice-to-Haves
Multi-cloud exposure.
Professional cloud certifications.
Knowledge of service mesh technologies
DevSecOps/security best practices.
Experience leading large-scale tech transformations.
If this sounds like something you'd thrive in—or even if you're just curious—I'd love to chat and tell you more.
Cavendish (Recruitment) Professionals Ltd are proud to be an equal opportunity employer and we believe that inclusivity begins with the candidate experience. All qualified applicants will receive consideration for employment regardless of, gender, race, age, sexual orientation, religion, or belief.
#J-18808-Ljbffr
I am partnered with a global consulting organisation that is expanding its engineering leadership team in Germany. They are seeking an experienced SRE Architect to define and drive the long-term reliability, scalability, and performance strategy across complex, cloud-native systems.
This is a senior architectural role with wide influence across engineering, combining deep technical expertise with leadership. You will set standards, frameworks, and practices that enable teams to deliver world-class services at scale.
Key Responsibilities
Architect & Strategy
- Design highly scalable and fault-tolerant infrastructure on leading cloud platforms (AWS, GCP, or Azure).
Reliability Frameworks
- Define and govern SLOs, SLIs, and error budgets across engineering teams.
Observability
- Lead observability design for metrics, tracing, logging, and alerting.
Automation & IaC
- Champion Infrastructure as Code (Terraform, Ansible) for secure, repeatable provisioning.
Resilience & Recovery
- Develop disaster recovery strategies, resilience patterns, and chaos engineering practices.
Leadership & Mentoring
- Act as a thought leader, mentoring engineers and embedding reliability best practices across the organisation.
Incident Evolution
- Analyse major incidents, drive systemic improvements, and evolve incident management culture.
Key Requirements
10+ years
in software engineering, DevOps, or systems engineering, including
5+ years
in senior SRE/architecture roles.
Expertise in at least one major
cloud provider
(AWS, GCP, Azure).
Strong hands-on experience with
Kubernetes
and microservices at scale.
Proven skills in
Infrastructure as Code
(Terraform, Ansible, Chef, or Puppet).
Solid background in
observability platforms
(Prometheus, Grafana, OpenTelemetry, ELK, Datadog, etc.).
Proficiency in
Python or Go
for automation and tooling.
Deep knowledge of
distributed systems, networking, and high-availability design patterns .
Nice-to-Haves
Multi-cloud exposure.
Professional cloud certifications.
Knowledge of service mesh technologies
DevSecOps/security best practices.
Experience leading large-scale tech transformations.
If this sounds like something you'd thrive in—or even if you're just curious—I'd love to chat and tell you more.
Cavendish (Recruitment) Professionals Ltd are proud to be an equal opportunity employer and we believe that inclusivity begins with the candidate experience. All qualified applicants will receive consideration for employment regardless of, gender, race, age, sexual orientation, religion, or belief.
#J-18808-Ljbffr