ZipRecruiter
Senior Software Engineer - Observability
ZipRecruiter, San Carlos, California, United States, 94071
POSITION SUMMARY
As the Senior Software Engineer focused on Observability, you will set observability standards, lead automation efforts and mentor engineers ensuring all monitoring and Datadog configuration changes are implemented Infrastructure-as-Code (IaC). You will lead the design and management of a code-driven Datadog observability platform, providing end-to-end visibility into Java applications, Kubernetes workloads and containerized infrastructure. This role emphasizes cost-effective observability at scale requiring deep expertise in Datadog monitoring, logging, tracing and optimization techniques. You'll collaborate closely with SRE, DevOps and Software Engineering teams to standardize monitoring and logging practices to deliver scalable, reliable and cost-efficient observability solutions. This is a hands-on engineering role focused on observability-as-code. All monitoring, logging, alerting, and Datadog configurations are defined and managed through Terraform, APIs and CI/CD workflows — not manual configuration in the Datadog UI. PRIMARY RESPONSIBILITIES
Own and define observability standards for Java applications, Kubernetes workloads and cloud infrastructure Configure and manage the Datadog platform using Terraform and Infrastructure-as-Code (IaC) best practices Drive adoption of structured JSON logging, distributed tracing and custom metrics across Java and Python services Optimize Datadog usage through cost governance, log filtering, sampling strategies and automated reporting Collaborate closely with Java developers and platform engineers to standardize instrumentation and alerting Troubleshoot and resolve issues with missing or misconfigured logs, metrics and traces, working with developers to ensure proper instrumentation and data flow into Datadog Lead incident response efforts using Datadog insights for actionable alerting, root cause analysis (RCA) and reliability improvements Serve as the primary point of contact for Datadog-related requests, supporting internal teams with onboarding, integration and usage questions Continuously audit and tune monitors for alert quality, reducing false positives and improving actionable signal detection Maintain clear internal documentation on Datadog usage, standards, integrations and IaC workflows Evaluate and propose improvements to the observability stack, including new Datadog features, OpenTelemetry adoption and future architecture changes Mentor engineers and develop internal training programs on Datadog, observability-as-code and modern log pipeline architecture QUALIFICATIONS
Bachelor's degree in Computer Science, Engineering, Mathematics, Physics or a related technical field 6+ years of professional software engineering experience building production-grade systems with emphasis on automation, integrations and infrastructure tooling Proficiency in at least one modern programming such as Go, Java, C++ or Python, with the ability to design, implement and maintain reusable code libraries, not just scripts. Proven experience developing and maintaining observability-as-code using tools such as Terraform. Hands-on experience managing observability platforms such as Datadog, New Relic or Dynatrace as code, using Terraform modules, APIs and CI/CD workflows at scale with deep expertise in APM, logs, metrics, tracing, dashboards and audit trails Experience integrating observability into CI/CD pipelines such as GitLab CI, GitHub Actions or AWS CodePipeline Solid understanding of AWS cloud services and monitoring practices for Kubernetes workloads Experience designing and implementing custom monitoring libraries, exporters or telemetry pipelines such as OpenTelemetry or Prometheus Experience with cost optimization strategies in observability platforms Contributions to open-source infrastructure-as-code, DevOps or observability tooling Mentorship experience with the ability to coach teams on observability best practices OUR OPPORTUNITY
Natera is a global leader in cell-free DNA (cfDNA) testing, dedicated to oncology, women's health, and organ health. Our aim is to make personalized genetic testing and diagnostics part of the standard of care to protect health and enable earlier and more targeted interventions that lead to longer, healthier lives. WHAT WE OFFER
Competitive Benefits - Employee benefits include comprehensive medical, dental, vision, life and plans for eligible employees and their dependents. Additionally, Natera employees and their immediate families receive free testing in addition to fertility care benefits. Other benefits include and baby bonding leave, 401k benefits, commuter benefits and much more. We also offer a generous employee referral program! Natera is proud to be an Equal Opportunity Employer. We are committed to ensuring a diverse and inclusive workplace environment, and welcome people of different backgrounds, experiences, abilities and perspectives. Inclusive collaboration benefits our employees, our community and our patients, and is critical to our mission of changing the management of disease worldwide. All qualified applicants are encouraged to apply, and will be considered without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, veteran status, or any other legally protected status. We also consider qualified applicants regardless of criminal histories, consistent with applicable laws.
#J-18808-Ljbffr
As the Senior Software Engineer focused on Observability, you will set observability standards, lead automation efforts and mentor engineers ensuring all monitoring and Datadog configuration changes are implemented Infrastructure-as-Code (IaC). You will lead the design and management of a code-driven Datadog observability platform, providing end-to-end visibility into Java applications, Kubernetes workloads and containerized infrastructure. This role emphasizes cost-effective observability at scale requiring deep expertise in Datadog monitoring, logging, tracing and optimization techniques. You'll collaborate closely with SRE, DevOps and Software Engineering teams to standardize monitoring and logging practices to deliver scalable, reliable and cost-efficient observability solutions. This is a hands-on engineering role focused on observability-as-code. All monitoring, logging, alerting, and Datadog configurations are defined and managed through Terraform, APIs and CI/CD workflows — not manual configuration in the Datadog UI. PRIMARY RESPONSIBILITIES
Own and define observability standards for Java applications, Kubernetes workloads and cloud infrastructure Configure and manage the Datadog platform using Terraform and Infrastructure-as-Code (IaC) best practices Drive adoption of structured JSON logging, distributed tracing and custom metrics across Java and Python services Optimize Datadog usage through cost governance, log filtering, sampling strategies and automated reporting Collaborate closely with Java developers and platform engineers to standardize instrumentation and alerting Troubleshoot and resolve issues with missing or misconfigured logs, metrics and traces, working with developers to ensure proper instrumentation and data flow into Datadog Lead incident response efforts using Datadog insights for actionable alerting, root cause analysis (RCA) and reliability improvements Serve as the primary point of contact for Datadog-related requests, supporting internal teams with onboarding, integration and usage questions Continuously audit and tune monitors for alert quality, reducing false positives and improving actionable signal detection Maintain clear internal documentation on Datadog usage, standards, integrations and IaC workflows Evaluate and propose improvements to the observability stack, including new Datadog features, OpenTelemetry adoption and future architecture changes Mentor engineers and develop internal training programs on Datadog, observability-as-code and modern log pipeline architecture QUALIFICATIONS
Bachelor's degree in Computer Science, Engineering, Mathematics, Physics or a related technical field 6+ years of professional software engineering experience building production-grade systems with emphasis on automation, integrations and infrastructure tooling Proficiency in at least one modern programming such as Go, Java, C++ or Python, with the ability to design, implement and maintain reusable code libraries, not just scripts. Proven experience developing and maintaining observability-as-code using tools such as Terraform. Hands-on experience managing observability platforms such as Datadog, New Relic or Dynatrace as code, using Terraform modules, APIs and CI/CD workflows at scale with deep expertise in APM, logs, metrics, tracing, dashboards and audit trails Experience integrating observability into CI/CD pipelines such as GitLab CI, GitHub Actions or AWS CodePipeline Solid understanding of AWS cloud services and monitoring practices for Kubernetes workloads Experience designing and implementing custom monitoring libraries, exporters or telemetry pipelines such as OpenTelemetry or Prometheus Experience with cost optimization strategies in observability platforms Contributions to open-source infrastructure-as-code, DevOps or observability tooling Mentorship experience with the ability to coach teams on observability best practices OUR OPPORTUNITY
Natera is a global leader in cell-free DNA (cfDNA) testing, dedicated to oncology, women's health, and organ health. Our aim is to make personalized genetic testing and diagnostics part of the standard of care to protect health and enable earlier and more targeted interventions that lead to longer, healthier lives. WHAT WE OFFER
Competitive Benefits - Employee benefits include comprehensive medical, dental, vision, life and plans for eligible employees and their dependents. Additionally, Natera employees and their immediate families receive free testing in addition to fertility care benefits. Other benefits include and baby bonding leave, 401k benefits, commuter benefits and much more. We also offer a generous employee referral program! Natera is proud to be an Equal Opportunity Employer. We are committed to ensuring a diverse and inclusive workplace environment, and welcome people of different backgrounds, experiences, abilities and perspectives. Inclusive collaboration benefits our employees, our community and our patients, and is critical to our mission of changing the management of disease worldwide. All qualified applicants are encouraged to apply, and will be considered without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, veteran status, or any other legally protected status. We also consider qualified applicants regardless of criminal histories, consistent with applicable laws.
#J-18808-Ljbffr