Elluminates Software
Senior Monitoring Engineer Introduction Elluminates Software provides innovation for Federal customers, including AI-driven SaaS, Cloud and On-Prem transformation, and advanced Infrastructure Automation. We have worked closely with the industry to innovate and collaborate on solutions with national and global technology platform impact for over twenty years. Job Description The Senior Monitoring Engineer is a senior-level technical expert who is accountable for the advanced troubleshooting, performance analysis and optimization of enterprise monitoring platforms. This position is responsible for the design, implementation, and ongoing enhancement of observability solutions in hybrid environments, including on-premises, cloud, and virtual infrastructure. The Senior Monitoring Engineer is responsible for the final escalation point for complex monitoring issues, collaborates with other teams to guarantee system reliability, and promotes best practices in observability. Duties and Responsibilities •Serve as the Tier 3 escalation point for issues related to any of the monitoring/ observability platforms and tools. •Lead root cause analysis (RCA) for major incidents and recurring performance issues. •Maintain, configure, and optimize monitoring tool deployments across cloud (e.g., AWS, Azure), on-premises, and VMware environments. •Design and implement custom dashboards, synthetic monitoring, and service-level objectives (SLOs). •Develop and maintain alerting strategies that reduce noise and ensure actionable notifications. •Work closely with application, infrastructure, DevOps, and security teams to define monitoring requirements and integrate observability into CI/CD pipelines. •Analyze metrics, logs, and traces to ensure end-to-end service visibility and performance optimization. •Assist in onboarding applications and teams into the observability platform. •Provide training and mentorship to Tier 1 and Tier 2 support teams. •Ensure platform resilience, availability, and compliance with internal standards and SLAs. •Participate in on-call rotations for high-priority incidents as needed. Required Qualifications •5+ years of experience in IT infrastructure, application performance monitoring, or site reliability engineering (SRE). •2+ years of hands-on experience using platforms such as Dynatrace, Zabbix, and monitoring tools in VMware Cloud Foundation (VCF). •Solid understanding of observability concepts including metrics, logs, traces, and user experience monitoring. •Experience supporting complex, distributed systems in cloud and hybrid environments. •Proficient with scripting and automation (e.g., PowerShell, Python, Bash, or Ansible). •Strong understanding of networking, Linux/Windows systems, containers, and application architectures (microservices, APIs, etc.). Desired Qualifications •Dynatrace Associate or Professional Certification. •Experience with Dynatrace, including OneAgent deployment, Smartscape, PurePath, and Davis AI. •Experience with integration of Dynatrace with tools such as ServiceNow, Splunk, Jira, or CI/CD pipelines. •Experience with other observability tools (e.g., Prometheus, Grafana, New Relic, AppDynamics, Splunk, Elastic). •Familiarity with DevOps practices and Infrastructure-as-Code (e.g., Terraform). •Understanding of ITIL framework and change management processes. •Excellent troubleshooting, problem-solving skills. •Strong written and verbal communication. •Ability to work independently and collaboratively across teams. •Customer-focused mindset and attention to detail. •Continuous learning and adaptability in a fast-paced environment. Years of Experience and Education Requirements •Bachelors and nine (9) years or more experience; Masters and seven (7) years or more experience; PhD or JD and four (4) years or more experience. Additional experience in lieu of degree. Type: Full-Time Clearance: Secret (with ability to obtain TS) Location: On Site in Springfield, Virginia Shift: Normal Business Hours Type of Travel: Local