Logo
MongoDB

Site Reliability Engineer (Senior or Staff), Observability

MongoDB, Austin, Texas, us, 78716

Save Job

Site Reliability Engineer (Senior or Staff), Observability

Join MongoDB to apply for the Site Reliability Engineer (Senior or Staff), Observability role. MongoDB’s mission is to empower innovators to create, transform, and disrupt industries by unleashing the power of software and data. We enable organizations of all sizes to build, scale, and run modern applications, modernizing legacy workloads and embracing AI. Our globally distributed, multi‑cloud database, MongoDB Atlas, powers companies such as Samsung and Toyota and is available in more than 115 regions across AWS, Google Cloud, and Microsoft Azure. Team and Role Overview

The SRE Observability team is part of the larger Platform Engineering organization. We build and maintain the observability stack—metrics, logging, and tracing—used by all engineering teams. We own the telemetry pipeline, monitoring, and alerting infrastructure, using tools such as VictoriaMetrics, Splunk, QuickWit, Jaeger, Fluentbit, and Vector. As an engineer on this team you will collaborate with other SWE and SRE teams to instrument and monitor services, promote best practices, and own critical internal infrastructure. This role can be based at our NYC HQ on a hybrid basis or fully remote from Eastern or Central time zones. Responsibilities

Define standards and vision for the mission‑critical observability platform leveraged by all parts of the engineering organization. Design, architect, build, and deliver core pieces of our observability services in collaboration with other stakeholders. Design, implement, and troubleshoot monitoring for globally distributed services across several cloud providers. Ensure reliability, resilience, fault‑tolerance, and self‑healing of services and infrastructure. Identify and configure key metrics to detect incidents and quantify service health, availability, and performance. Participate in a week‑long on‑call rotation and blameless post‑mortem process. Improve observability capabilities, optimizing for cost, ease of use, and maintainability. Requirements

Experience running mission‑critical services at scale. Experience with observability of large‑scale distributed systems. Understanding of information security issues. Firm grasp of at least one modern programming language, beyond basic scripting. Solid understanding of web and network protocols and standards (HTTP, TLS, DNS, etc.). Bachelor’s degree in Computer Science or equivalent experience. Nice to Haves

Experience with at least one major cloud provider (AWS, GCP, Azure). Experience working in a Kubernetes‑based environment. Benefits

Generous compensation package. Opportunities to learn on the job and up‑skill in new technologies. High level of independence in day‑to‑day work. MongoDB is committed to providing any necessary accommodations for individuals with disabilities within our application and interview process. To request an accommodation due to a disability, please inform your recruiter. MongoDB, Inc. provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type and makes all hiring decisions without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws. MongoDB’s base salary range for this role is

$127,000—$249,000 USD

(U.S.‑based candidates).

#J-18808-Ljbffr