Logo
Hirexa Solutions

Site Reliability Engineer

Hirexa Solutions, Germantown, Ohio, United States

Save Job

Overview

UK/EU Delivery Lead | Talent Acquisition & Recruitment Expert Job Title:

Site Reliability Engineer Location:

Germany (Remote) Employment Type:

FTE/FTC About Hirexa Solutions:

Hirexa Solutions is a recruitment provider serving the United States, United Kingdom, Europe, and India. We empower clients to improve productivity, adopt agile structures, and execute project deliverables through intelligent technology and people-focused approaches. About the Role: We are seeking a Site Reliability Engineer (SRE) with a strong background in observability, secure logging, and automation. The ideal candidate will have hands-on experience with Elasticsearch and/or Prometheus. Responsibilities include platform operations, incident management, maintenance tasks, and contributing to engineering efforts to improve system stability. The SRE will adhere to SOPs and contribute to their continuous improvement through feedback. Mandate conditions: Skill set: Observability, Network, Open Observability, SNMP, SSH, Prometheus, Visualization (Grafana), CI/CD (GitHub), Cluster management, Private Cloud, Kubernetes, Alert management, Logstack, Troubleshooting, Repositories, DNS, IP address range, TCP connections, Linux Must reside in Germany or be willing to relocate and be in Germany at the time of joining Ü2 security clearance – comfortable to undergo the process 24x7 Operational Support Key Responsibilities: Platform Engineering & DevOps: Manage Kubernetes and container orchestration, including Helm chart configurations and CI/CD pipelines (Jenkins, ArgoCD). Develop automation scripts (Python, Bash, Go) and deploy Infrastructure-as-Code (IaC) solutions. Observability, Monitoring & Visualization: Maintain Prometheus configurations, alert rules, and Grafana dashboards; administer Thanos and Grafana. Elastic Stack Operations & Log Management: Configure and optimise Elasticsearch clusters, Logstash pipelines, and Kibana dashboards for secure, scalable log processing. Incident Response, Troubleshooting & Collaboration: Participate in 24x7 on-call rotations, troubleshoot platform, data and performance issues, and engage in Major Incident Management (MIM). Secure Operations & Compliance: Ensure operations meet security and data protection requirements, maintain secure documentation, and manage access control policies. Qualifications, Requirements, and Skills: Strong Linux knowledge, preferably in Kubernetes environments. Solid networking fundamentals and REST APIs understanding. Proficiency in Python, Go, or Bash. Experience with Git-based configuration management workflows. Familiarity with CI/CD tools like Helm, Jenkins, or ArgoCD. Experience with Elasticsearch and/or OpenSearch. Willingness to work 24x7 on-call shifts, including weekends/holidays. Must possess Ü2 security clearance. Citizenship: EU and NATO member states only. No dual citizenship outside these countries. Must reside in Germany and hold a German labor contract. Preferred Certifications: Elastic Certified Engineer, LPIC Level 2, Kubernetes Administrator. How to Apply: If you are interested in this opportunity, please submit your resume. We look forward to hearing from you! Details

Seniority level: Mid-Senior level Employment type: Full-time Job function: Consulting, Engineering, and Analyst Industries: IT Services and IT Consulting Note: This posting is for an exciting SRE role with 24x7 operational responsibilities.

#J-18808-Ljbffr