Eunice Group
Systems Reliability Engineer Eunice Energy Group
Eunice Group, Eunice, Louisiana, United States, 70535
Who We Are
EUNICE is a fast-growing technology and energy group committed to innovation, sustainability, and the development of high‑impact solutions. We operate at the intersection of engineering, clean energy, and advanced technology—turning ideas into real products and systems. Joining EUNICE means working with teams that move quickly, think boldly, and deliver results. We value technical excellence, hands‑on problem solving, and continuous improvement. We invest in our people, encourage ownership, and give engineers the space to shape meaningful projects. At EUNICE, you will help build solutions that drive the energy transition, while developing your skills in a culture that rewards initiative, collaboration, and real‑world impact.
Role Overview The Systems Reliability Engineer will ensure that EUNICE platforms achieve high‑grade reliability. The role introduces AI‑Ops practices, predictive monitoring, and self‑healing systems that guarantee 24/7 uptime. This position bridges infrastructure, software, and operations to embed resilience into every layer.
Key Responsibilities
Reliability & Performance.
Design and implement monitoring, alerting, and observability frameworks.
Leverage AI for predictive failure detection and system optimization.
Ensure 24/7 availability across HR, operations, and educational platforms.
Automation & Efficiency.
Introduce automated recovery and self‑healing systems.
Reduce manual interventions by scaling DevOps and SRE practices.
Continuously optimize system performance and resilience.
Collaboration.
Work with development teams to embed reliability in design.
Partner with infrastructure and AI architects for holistic solutions.
Advise leadership on reliability strategies and trade‑offs.
Qualifications
Bachelor’s in Computer Science, Engineering, or related field.
5+ years experience in SRE, DevOps, or infrastructure engineering.
Knowledge of observability tools (Prometheus, Grafana, ELK, etc.).
Experience with cloud‑native reliability practices.
Familiarity with AI‑Ops frameworks and predictive monitoring.
Key Competencies
Reliability‑first mindset.
Analytical and problem‑solving ability.
Cross‑functional collaboration.
Continuous improvement orientation.
Clear communication.
Impact of the Role The SRE role transforms EUNICE systems into reliable, trusted platforms. It ensures that digital operations never fail, enhancing credibility and enabling seamless global operations.
Special Skills
Advanced Observability ability: Ability to design end‑to‑end observability stacks (metrics, logs, traces) and diagnose complex distributed system issues.
AI‑Ops Proficiency: Hands‑on experience with AI‑driven monitoring, anomaly detection, and predictive analytics.
Automation Mastery: Strong skills in automating reliability workflows, including self‑healing scripts, automated rollbacks, and infrastructure‑as‑code.
Cloud Native Reliability: Deep familiarity with Kubernetes, service mesh technologies, autoscaling strategies, and resilient microservices design.
Chaos Engineering: Ability to design and execute controlled failure scenarios to validate system robustness.
Performance Engineering: Skilled in identifying bottlenecks, optimizing workloads, and tuning cloud/edge environments.
Incident Command: Strong capability to lead incident response, root‑cause analysis, and post‑mortem improvements.
Scalable Architecture Understanding: Ability to build systems that handle peak loads, fail gracefully, and recover instantly.
Security‑Aware Engineering: Knowledge of secure configurations, zero‑trust principles, and compliance‑aligned reliability.
Scripting & Automation Languages: Strong command of Python, Bash, Go, or similar languages for tooling and automation.
Application Process To apply for this job email your details to
hr@eunice-group.com
or fill in the following form.
#J-18808-Ljbffr
Role Overview The Systems Reliability Engineer will ensure that EUNICE platforms achieve high‑grade reliability. The role introduces AI‑Ops practices, predictive monitoring, and self‑healing systems that guarantee 24/7 uptime. This position bridges infrastructure, software, and operations to embed resilience into every layer.
Key Responsibilities
Reliability & Performance.
Design and implement monitoring, alerting, and observability frameworks.
Leverage AI for predictive failure detection and system optimization.
Ensure 24/7 availability across HR, operations, and educational platforms.
Automation & Efficiency.
Introduce automated recovery and self‑healing systems.
Reduce manual interventions by scaling DevOps and SRE practices.
Continuously optimize system performance and resilience.
Collaboration.
Work with development teams to embed reliability in design.
Partner with infrastructure and AI architects for holistic solutions.
Advise leadership on reliability strategies and trade‑offs.
Qualifications
Bachelor’s in Computer Science, Engineering, or related field.
5+ years experience in SRE, DevOps, or infrastructure engineering.
Knowledge of observability tools (Prometheus, Grafana, ELK, etc.).
Experience with cloud‑native reliability practices.
Familiarity with AI‑Ops frameworks and predictive monitoring.
Key Competencies
Reliability‑first mindset.
Analytical and problem‑solving ability.
Cross‑functional collaboration.
Continuous improvement orientation.
Clear communication.
Impact of the Role The SRE role transforms EUNICE systems into reliable, trusted platforms. It ensures that digital operations never fail, enhancing credibility and enabling seamless global operations.
Special Skills
Advanced Observability ability: Ability to design end‑to‑end observability stacks (metrics, logs, traces) and diagnose complex distributed system issues.
AI‑Ops Proficiency: Hands‑on experience with AI‑driven monitoring, anomaly detection, and predictive analytics.
Automation Mastery: Strong skills in automating reliability workflows, including self‑healing scripts, automated rollbacks, and infrastructure‑as‑code.
Cloud Native Reliability: Deep familiarity with Kubernetes, service mesh technologies, autoscaling strategies, and resilient microservices design.
Chaos Engineering: Ability to design and execute controlled failure scenarios to validate system robustness.
Performance Engineering: Skilled in identifying bottlenecks, optimizing workloads, and tuning cloud/edge environments.
Incident Command: Strong capability to lead incident response, root‑cause analysis, and post‑mortem improvements.
Scalable Architecture Understanding: Ability to build systems that handle peak loads, fail gracefully, and recover instantly.
Security‑Aware Engineering: Knowledge of secure configurations, zero‑trust principles, and compliance‑aligned reliability.
Scripting & Automation Languages: Strong command of Python, Bash, Go, or similar languages for tooling and automation.
Application Process To apply for this job email your details to
hr@eunice-group.com
or fill in the following form.
#J-18808-Ljbffr