Raas Infotek
Observability Specialist
Location:
New Jersey / Chicago, IL (Onsite)
Visa:
GC/USC
Seeking an experienced Observability Specialist with a strong focus on observability and monitoring. In this critical role, you will be instrumental in ensuring the reliability, performance, and efficiency of our systems through the strategic implementation and management of cutting‑edge observability practices and tooling. You will leverage your deep technical expertise to provide actionable insights, proactively identify potential issues, and drive continuous improvement across our infrastructure and applications. Key responsibilities include leading technical discussions with customers on solutions such as DataDog, defining roadmaps, and engaging in new feature discussions initiated by vendors and customers.
Key Responsibilities
Design and implement a comprehensive observability framework and roadmap.
Lead system performance benchmarking and optimization initiatives.
Establish automated recovery mechanisms for common failure scenarios.
Develop and enforce reliable monitoring solutions.
Create technical standards for resilient monitoring and approach.
Participate in Root Cause Analysis (RCA) and post‑mortem processes.
Develop frameworks to establish correlation in system failures.
Design, implement, and manage end‑to‑end observability solutions encompassing metrics, logs, and traces across infrastructure and applications.
Evaluate, deploy, and maintain tools for monitoring, logging, tracing, alerting, and automation.
Define intelligent alerting rules and escalation policies to ensure timely incident response.
Analyze observability data to identify trends, anomalies, and potential risks, generating actionable insights and reports.
Qualifications
Significant experience as an Observability Specialist or similar role with strong focus on observability and monitoring.
Deep understanding of observability principles and best practices (metrics, logging, tracing).
Experience implementing and managing centralized logging and monitoring systems.
Experience with cloud platforms (AWS, Azure, GCP) and containerization technologies (Docker, Kubernetes, OpenShift).
Familiarity with AIOps and ML‑based anomaly detection systems is a plus.
Background in database performance monitoring and optimization.
Knowledge of Service Level Objectives (SLOs) and KPI implementation.
Experience participating in Root Cause Analysis (RCA) and post‑mortem processes.
Understanding of compliance requirements related to monitoring and logging.
Excellent problem‑solving and analytical skills.
Strong communication and collaboration skills.
#J-18808-Ljbffr
New Jersey / Chicago, IL (Onsite)
Visa:
GC/USC
Seeking an experienced Observability Specialist with a strong focus on observability and monitoring. In this critical role, you will be instrumental in ensuring the reliability, performance, and efficiency of our systems through the strategic implementation and management of cutting‑edge observability practices and tooling. You will leverage your deep technical expertise to provide actionable insights, proactively identify potential issues, and drive continuous improvement across our infrastructure and applications. Key responsibilities include leading technical discussions with customers on solutions such as DataDog, defining roadmaps, and engaging in new feature discussions initiated by vendors and customers.
Key Responsibilities
Design and implement a comprehensive observability framework and roadmap.
Lead system performance benchmarking and optimization initiatives.
Establish automated recovery mechanisms for common failure scenarios.
Develop and enforce reliable monitoring solutions.
Create technical standards for resilient monitoring and approach.
Participate in Root Cause Analysis (RCA) and post‑mortem processes.
Develop frameworks to establish correlation in system failures.
Design, implement, and manage end‑to‑end observability solutions encompassing metrics, logs, and traces across infrastructure and applications.
Evaluate, deploy, and maintain tools for monitoring, logging, tracing, alerting, and automation.
Define intelligent alerting rules and escalation policies to ensure timely incident response.
Analyze observability data to identify trends, anomalies, and potential risks, generating actionable insights and reports.
Qualifications
Significant experience as an Observability Specialist or similar role with strong focus on observability and monitoring.
Deep understanding of observability principles and best practices (metrics, logging, tracing).
Experience implementing and managing centralized logging and monitoring systems.
Experience with cloud platforms (AWS, Azure, GCP) and containerization technologies (Docker, Kubernetes, OpenShift).
Familiarity with AIOps and ML‑based anomaly detection systems is a plus.
Background in database performance monitoring and optimization.
Knowledge of Service Level Objectives (SLOs) and KPI implementation.
Experience participating in Root Cause Analysis (RCA) and post‑mortem processes.
Understanding of compliance requirements related to monitoring and logging.
Excellent problem‑solving and analytical skills.
Strong communication and collaboration skills.
#J-18808-Ljbffr