Romeo

Senior Infrastructure & Systems Engineer

Romeo, Germantown, Ohio, United States

We are seeking an experienced Senior Infrastructure/System Engineer to help operate, maintain, and improve our on-premise infrastructure environment. You will work hands‑on with Linux systems, Kubernetes, RabbitMQ, Redis, and Elasticsearch, ensuring that our platforms are secure, scalable, and highly available.

This role is ideal for a technically driven engineer who thrives in complex, production‑grade infrastructure environments and enjoys optimizing systems for reliability and performance.

This position offers a

hybrid work mode , combining on‑site collaboration with flexible remote work.

Key Responsibilities

Manage and enhance Kubernetes clusters, including configuration, upgrades, scaling, and deployment automation using Helm and Docker.

Operate, maintain, and optimize Linux‑based systems in an on‑premise datacenter environment.

Manage, monitor, and troubleshoot RabbitMQ clusters, ensuring message delivery reliability, scalability, and fault tolerance.

Administer and optimize Redis, Elasticsearch, and MySQL databases for performance, stability, and data integrity.

Support and execute database migration and infrastructure modernization projects within the on‑prem environment.

Implement and maintain infrastructure‑as‑code practices using Terraform, Ansible, GitLab CI/CD and Puppet.

Maintain and improve monitoring, logging, and alerting systems (e.g., Prometheus, Grafana, Zabbix, ELK).

Collaborate with development and product teams to support deployment pipelines and performance optimization.

Participate in incident response, root cause analysis, and on‑call rotations to ensure system reliability.

Ensure all systems meet security and data protection requirements.

Qualifications

5+ years of hands‑on experience in system or infrastructure engineering, focused on on‑premise environments.

Strong expertise in Linux system administration (Debian/Ubuntu or RHEL/CentOS).

Deep understanding of RabbitMQ, including clustering, high availability, performance tuning, and troubleshooting.

Experience managing and optimizing Redis and Elasticsearch in production.

Solid practical knowledge of Kubernetes, Docker, and Helm, including cluster management, deployments, upgrades, and troubleshooting.

Experience with infrastructure automation and configuration management using Terraform, Ansible, GitLab CI/CD and Puppet.

Strong understanding of networking, including routing, firewalls, VPNs, and secure access.

Experience with monitoring and observability tools (Prometheus, Grafana, Zabbix, ELK).

Excellent problem‑solving and analytical skills, with attention to performance, reliability, and maintainability.

Familiarity with cloud providers such as AWS would be a plus.

Preferred Qualifications

Experience managing RabbitMQ, Redis, Elasticsearch, and MySQL at scale in production environments.

Experience working with high‑availability and distributed on-prem systems.

Familiarity with container security and system hardening best practices.

Knowledge of disaster recovery and business continuity planning for on‑prem environments.

#J-18808-Ljbffr