CloudIngest

Site Reliability Engineer

CloudIngest, Trenton, New Jersey, United States

Overview

Site Reliability Engineer (SRE)

focused on Dynatrace, OpenTelemetry, and Data Observability using tools like Splunk, Datadog, and New Relic. Location : Berkeley Heights, NJ | Onsite Work Setting (5 days/week in the office required) Role Overview

We’re seeking a skilled Site Reliability Engineer with deep expertise in OpenTelemetry and data observability platforms (Splunk, Datadog, New Relic) to enhance system reliability, performance monitoring, and incident response. You’ll design and implement telemetry pipelines, drive observability best practices, and collaborate across engineering teams to ensure our systems are measurable, scalable, and resilient. Key Responsibilities

Design and implement telemetry pipelines using OpenTelemetry SDKs and collectors. Integrate observability tools (Splunk or Datadog, or New Relic) with cloud-native and hybrid environments. Develop and maintain dashboards, alerts, and SLOs for critical services. Collaborate with DevOps and engineering teams to instrument applications for metrics, logs, and traces. Lead incident analysis and postmortems, using observability data to identify root causes. Advocate for observability-first engineering. Key Qualifications/Skillset

10+ years for Senior role and 7+ years for mid-level role in SRE, DevOps, or Platform Engineering roles. Strong hands-on experience with OpenTelemetry (SDKs, collectors, OTLP). Deep familiarity with Splunk, Datadog, or New Relic. Proficiency in cloud platforms (GCP, AWS, or Azure). Solid understanding of distributed systems, microservices, and CI/CD pipelines. Experience with infrastructure-as-code (Terraform, Helm, etc.) is a plus. Strong scripting skills (Python, Bash, or Go preferred). Seniority level

Mid-Senior level Employment type

Contract Job function

Information Technology, Project Management, and Consulting Industries

Financial Services, Banking, and Investment Banking

#J-18808-Ljbffr