Logo
EverBank

Sr. Observability Engineer

EverBank, Jacksonville, Florida, United States, 32290

Save Job

Join to apply for the

Sr. Observability Engineer

role at

EverBank

Senior Observability Engineer

The Sr. Observability Engineer plays a critical role in ensuring the reliability, availability, and performance of enterprise systems by designing and implementing observability solutions. This position supports the IT Incident Management Team (IMT) by providing actionable insights, telemetry, and automation to reduce Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR). The role combines deep technical expertise in monitoring, logging, and tracing with a strong understanding of SRE principles.

Key Responsibilities And Duties

Designs, implements, and maintains observability tools (e.g., Splunk, Prometheus, Grafana, OpenTelemetry).

Develops dashboards, alerts, and automated workflows to support proactive incident detection.

Partners with IMT to provide real-time telemetry during major incidents.

Conducts root cause analysis using logs, metrics, and traces.

Improves incident response processes through automation and data-driven insights.

Defines and monitors Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Error Budgets.

Collaborates with application and infrastructure teams to embed observability into CI/CD pipelines.

Identifies gaps in monitoring coverage and implements solutions.

Drives adoption of observability best practices across engineering teams.

Minimum Qualifications

3 years of technical experience supporting enterprise systems

Previous experience with observability tools or site reliability engineering

Preferred Qualifications

5 years of experience with observability tools (Splunk, ELK, Prometheus, Grafana, OpenTelemetry)

Proficiency in scripting languages (Python, Bash) and automation frameworks

Certifications in SRE, ITIL, or cloud technologies

Familiarity with cloud platforms (Azure, AWS, or GCP) and container orchestration (Kubernetes)

Experience with AIOps or machine learning for anomaly detection

Incident Management (IMT) - Provide Incident Analysis, Run Book, suggest improvements and collaborate with wider group

Build & Publish operation KPI's - Sev1 / Sev2, MTTR, MTTD, Incident Volume, Application performance

CI / CD Tools - GitHub, Jenkins, Azure DevOps

Educational Requirements

University (Degree) Preferred

Physical Requirements

Physical Requirements: Sedentary Work

Benefits

Medical, dental, vision & HSA/FSA

401(k) savings

Paid holidays & generous PTO

Additional wellness & voluntary benefits

Tuition reimbursement

Commuter Benefits

Life and Disability Insurance

#J-18808-Ljbffr