Logo
Ad Astra Consultants

Site Reliability Engineer (24x7 Operational Support)

Ad Astra Consultants, Germantown, Ohio, United States

Save Job

Site Reliability Engineer (24x7 Operational Support)

We are hiring for our client who is a global technology company, home to more than 223,000 people across 60 countries, delivering industry-leading capabilities centered around digital, engineering, cloud and AI, powered by a broad portfolio of technology services and products. We work with clients across all major verticals, providing industry solutions for Financial Services, Manufacturing, Life Sciences and Healthcare, High Tech, Semiconductor, Telecom and Media, Retail and CPG and Public Services. Consolidated revenues as of 12 months ending June 2025 totaled $14 billion. Mandate conditions

Skill required : Observability- Network, Open Observability, SNMP protocol, SSH, Prometheus, Visuvalition- Grafana, CICD- Gitub, Cluster management, Private Cloud, Kubernetes Cluster, Alert management, Operation- Logstack, Troubleshooting, Repository, Kerl command- DNS,IP address range, TCP connection, Linux Must be an EU Citizen from NATO Country Should Live in Germany/Open to relocate and should be in Germany at the time of joining Ü2 security clearance – should be comfortable to undergo this process 24x7 Operational Support Is German Language is Mandatory? NO About the Role

We are seeking a Site Reliability Engineer (SRE) with a strong background in observability, secure logging, and automation. The ideal candidate will have hands-on experience with Elasticsearch and/or Prometheus platforms. This role encompasses critical responsibilities in platform operations, including incident management, execution of scheduled maintenance, and contributing to engineering tasks focused on enhancing system stability. The SRE will also be responsible for adhering to standard operating procedures (SOPs) and actively contributing to their continuous improvement by providing constructive feedback. Key Responsibilities

Platform Engineering & DevOps: Manage Kubernetes and container orchestration, including Helm chart configurations and CI/CD pipelines (Jenkins, ArgoCD). Develop automation scripts (Python, Bash, Go) and deploy Infrastructure-as-Code (IaC) solutions. Observability, Monitoring & Visualisation: Maintain Prometheus solutions (scrape configurations, alert rules, PromQL queries), administer Thanos and Grafana. Elastic Stack Operations & Log Management: Configure and optimise Elasticsearch clusters, Logstash pipelines, and Kibana dashboards for secure, scalable log processing. Incident Response, Troubleshooting & Collaboration: Participate in 24x7 on-call rotations for rapid incident response, troubleshoot platform, data and performance issues, and engage in Major Incident Management (MIM). Secure Operations & Compliance: Ensure system operations meet security and data protection requirements, maintain secure documentation, and manage access control policies. Qualifications, Requirements, and Skills

Strong grasp of Linux concepts, preferably in Kubernetes environments. Solid understanding of networking fundamentals and REST APIs. Proficiency in Python, Go, or Bash. Proficiency in Git-based configuration management workflows. Familiarity with CI/CD tools like Helm, Jenkins, or ArgoCD. Experience with Elasticsearch and/or OpenSearch. Willingness to work shift-based 24x7 on-call support, including weekends and holidays. Must possess Ü2 security clearance. Citizenship required:

Member state of EU and NATO. No dual citizenship outside these countries. Must reside in Germany and hold a German labor contract. Preferred Certifications: Elastic Certified Engineer, LPIC Level 2, Kubernetes Administrator. Seniority level

Mid-Senior level Employment type

Full-time Job function

Information Technology Industries

IT Services and IT Consulting and Software Development Frankfurt am Main, Hesse, Germany 2 weeks ago

#J-18808-Ljbffr