HCLTech Germany
Site Reliability Engineer (24x7 Operational Support) (w/m/d)
HCLTech Germany, Germantown, Ohio, United States
Site Reliability Engineer (24x7 Operational Support) (w/m/d)
Direct message the job poster from HCLTech Germany
We are HCLTech, one of the fastest-growing large tech companies in the world and home to 225,000+ people across 60 countries, supercharging progress through industry‑leading capabilities centered around Digital, Engineering and Cloud. The driving force behind that work, our people, are diverse, creative, and passionate, raising the bar for excellence on a regular basis. We, in turn, work hard to bring out the best in them as we strive to help them find their spark and become the best version of themselves that they can be. If all this sounds like an environment you’ll thrive in, then you’re in the right place. Join us on our journey in advancing the technological world through innovation and creativity.
We are seeking a Site Reliability Engineer (SRE) with a strong background in observability, secure logging, and automation. The ideal candidate will have hands‑on experience with Elasticsearch and/or Prometheus platforms. This role encompasses critical responsibilities in platform operations, including incident management, execution of scheduled maintenance, and contributing to engineering tasks focused on enhancing system stability. The SRE will also be responsible for adhering to standard operating procedures (SOPs) and actively contributing to their continuous improvement by providing constructive feedback.
Key Responsibilities:
Manage Kubernetes and container orchestration, including Helm chart configurations and CI/CD pipelines (Jenkins, ArgoCD). Develop automation scripts (Python, Bash, Go) and deploy Infrastructure‑as‑Code (IaC) solutions
Maintain Prometheus solutions (scrape configurations, alert rules, PromQL queries), administer Thanos and Grafana
Configure and optimise Elasticsearch clusters, Logstash pipelines, and Kibana dashboards for secure, scalable log processing
Participate in 24x7 on‑call rotations for rapid incident response, troubleshoot platform, data and performance issues, and engage in Major Incident Management (MIM)
Ensure system operations meet security and data protection requirements, maintain secure documentation, and manage access control policies
Requirements:
Strong grasp of Linux concepts, preferably in Kubernetes environments
Solid understanding of networking fundamentals and REST APIs
Proficiency in Python, Go, or Bash
Proficiency in Git‑based configuration management workflows
Familiarity with CI/CD tools like Helm, Jenkins, or ArgoCD
Experience with Elasticsearch and/or OpenSearch
Willingness to work shift‑based 24x7 on‑call support, including weekends and holidays
Must possess Ü2 security clearance or willing to do one
Citizenship required: Member state of EU and NATO. No dual citizenship outside this contries
Preferred Certifications: Elastic Certified Engineer, LPIC Level 2, Kubernetes Administrator
We promote equal opportunities for all employees, regardless of their cultural and social background, gender, disability, age, religion, beliefs, and sexual identity. We give priority consideration to severely disabled applicants and those of equal status in the case of equal suitability.
Frankfurt am Main, Hesse, Germany (2 weeks ago)
#J-18808-Ljbffr
We are HCLTech, one of the fastest-growing large tech companies in the world and home to 225,000+ people across 60 countries, supercharging progress through industry‑leading capabilities centered around Digital, Engineering and Cloud. The driving force behind that work, our people, are diverse, creative, and passionate, raising the bar for excellence on a regular basis. We, in turn, work hard to bring out the best in them as we strive to help them find their spark and become the best version of themselves that they can be. If all this sounds like an environment you’ll thrive in, then you’re in the right place. Join us on our journey in advancing the technological world through innovation and creativity.
We are seeking a Site Reliability Engineer (SRE) with a strong background in observability, secure logging, and automation. The ideal candidate will have hands‑on experience with Elasticsearch and/or Prometheus platforms. This role encompasses critical responsibilities in platform operations, including incident management, execution of scheduled maintenance, and contributing to engineering tasks focused on enhancing system stability. The SRE will also be responsible for adhering to standard operating procedures (SOPs) and actively contributing to their continuous improvement by providing constructive feedback.
Key Responsibilities:
Manage Kubernetes and container orchestration, including Helm chart configurations and CI/CD pipelines (Jenkins, ArgoCD). Develop automation scripts (Python, Bash, Go) and deploy Infrastructure‑as‑Code (IaC) solutions
Maintain Prometheus solutions (scrape configurations, alert rules, PromQL queries), administer Thanos and Grafana
Configure and optimise Elasticsearch clusters, Logstash pipelines, and Kibana dashboards for secure, scalable log processing
Participate in 24x7 on‑call rotations for rapid incident response, troubleshoot platform, data and performance issues, and engage in Major Incident Management (MIM)
Ensure system operations meet security and data protection requirements, maintain secure documentation, and manage access control policies
Requirements:
Strong grasp of Linux concepts, preferably in Kubernetes environments
Solid understanding of networking fundamentals and REST APIs
Proficiency in Python, Go, or Bash
Proficiency in Git‑based configuration management workflows
Familiarity with CI/CD tools like Helm, Jenkins, or ArgoCD
Experience with Elasticsearch and/or OpenSearch
Willingness to work shift‑based 24x7 on‑call support, including weekends and holidays
Must possess Ü2 security clearance or willing to do one
Citizenship required: Member state of EU and NATO. No dual citizenship outside this contries
Preferred Certifications: Elastic Certified Engineer, LPIC Level 2, Kubernetes Administrator
We promote equal opportunities for all employees, regardless of their cultural and social background, gender, disability, age, religion, beliefs, and sexual identity. We give priority consideration to severely disabled applicants and those of equal status in the case of equal suitability.
Frankfurt am Main, Hesse, Germany (2 weeks ago)
#J-18808-Ljbffr