CloudIngest
Overview
Site Reliability Engineer (SRE)
focused on Dynatrace, OpenTelemetry, and Data Observability using tools like Splunk, Datadog, and New Relic. Location : Berkeley Heights, NJ | Onsite Work Setting (5 days/week in the office required) Role Overview
We’re seeking a skilled Site Reliability Engineer with deep expertise in OpenTelemetry and data observability platforms (Splunk, Datadog, New Relic) to enhance system reliability, performance monitoring, and incident response. You’ll design and implement telemetry pipelines, drive observability best practices, and collaborate across engineering teams to ensure our systems are measurable, scalable, and resilient. Key Responsibilities
Design and implement telemetry pipelines using OpenTelemetry SDKs and collectors. Integrate observability tools (Splunk or Datadog, or New Relic) with cloud-native and hybrid environments. Develop and maintain dashboards, alerts, and SLOs for critical services. Collaborate with DevOps and engineering teams to instrument applications for metrics, logs, and traces. Lead incident analysis and postmortems, using observability data to identify root causes. Advocate for observability-first engineering. Key Qualifications/Skillset
10+ years for Senior role and 7+ years for mid-level role in SRE, DevOps, or Platform Engineering roles. Strong hands-on experience with OpenTelemetry (SDKs, collectors, OTLP). Deep familiarity with Splunk, Datadog, or New Relic. Proficiency in cloud platforms (GCP, AWS, or Azure). Solid understanding of distributed systems, microservices, and CI/CD pipelines. Experience with infrastructure-as-code (Terraform, Helm, etc.) is a plus. Strong scripting skills (Python, Bash, or Go preferred). Seniority level
Mid-Senior level Employment type
Contract Job function
Information Technology, Project Management, and Consulting Industries
Financial Services, Banking, and Investment Banking
#J-18808-Ljbffr
Site Reliability Engineer (SRE)
focused on Dynatrace, OpenTelemetry, and Data Observability using tools like Splunk, Datadog, and New Relic. Location : Berkeley Heights, NJ | Onsite Work Setting (5 days/week in the office required) Role Overview
We’re seeking a skilled Site Reliability Engineer with deep expertise in OpenTelemetry and data observability platforms (Splunk, Datadog, New Relic) to enhance system reliability, performance monitoring, and incident response. You’ll design and implement telemetry pipelines, drive observability best practices, and collaborate across engineering teams to ensure our systems are measurable, scalable, and resilient. Key Responsibilities
Design and implement telemetry pipelines using OpenTelemetry SDKs and collectors. Integrate observability tools (Splunk or Datadog, or New Relic) with cloud-native and hybrid environments. Develop and maintain dashboards, alerts, and SLOs for critical services. Collaborate with DevOps and engineering teams to instrument applications for metrics, logs, and traces. Lead incident analysis and postmortems, using observability data to identify root causes. Advocate for observability-first engineering. Key Qualifications/Skillset
10+ years for Senior role and 7+ years for mid-level role in SRE, DevOps, or Platform Engineering roles. Strong hands-on experience with OpenTelemetry (SDKs, collectors, OTLP). Deep familiarity with Splunk, Datadog, or New Relic. Proficiency in cloud platforms (GCP, AWS, or Azure). Solid understanding of distributed systems, microservices, and CI/CD pipelines. Experience with infrastructure-as-code (Terraform, Helm, etc.) is a plus. Strong scripting skills (Python, Bash, or Go preferred). Seniority level
Mid-Senior level Employment type
Contract Job function
Information Technology, Project Management, and Consulting Industries
Financial Services, Banking, and Investment Banking
#J-18808-Ljbffr