ZipRecruiter
Staff Software Engineer (Observability)
ZipRecruiter, San Francisco, California, United States, 94199
Overview
We build AI that puts people first. Instead of just making organizations more efficient, we ensure AI systems actually help humans thrive. We focus on healthcare because getting things right matters most in this field. Our technology provides organizations with confidence that their AI is functioning correctly before deployment through advanced testing and verification. We operate globally and follow strict regulations. Our team comes from diverse backgrounds to create AI that organizations can trust and that genuinely benefits people. About this role As a Staff Software Engineer (Observability) at Amigo, you\'ll build the monitoring, logging, and debugging infrastructure that ensures our AI agents operate reliably and transparently. You\'ll design systems that provide visibility into our platform\'s behavior, enabling our team to maintain reliability and quickly diagnose issues that arise. What you\'ll do
Design and implement observability infrastructure across the entire platform Build real-time monitoring systems that detect anomalies before they impact patient care Create advanced debugging tools for complex distributed systems and AI model behavior Implement distributed tracing systems that track requests across services Design alerting systems that minimize false positives while catching all critical issues Build dashboards and analytics tools that provide insights into system performance and health Implement log aggregation and analysis systems for compliance and debugging Create performance profiling tools for identifying bottlenecks in AI inference pipelines Design systems for monitoring AI model drift and behavior changes over time Build chaos engineering tools to test system resilience and failure modes What we\'re looking for
7+ years of experience building observability and monitoring systems Deep expertise with observability and distributed tracing tools Strong experience with distributed systems and service architectures Experience building monitoring for complex distributed systems and application performance Knowledge of statistical analysis and anomaly detection techniques Strong programming skills in multiple Experience with time series databases and analytics Understanding of SRE principles and practices Experience with performance profiling and optimization Strong debugging skills for complex distributed systems Nice to have
Experience in healthcare, finance, or other regulated industries Background with statistical monitoring and performance optimization Experience with compliance monitoring and audit logging Knowledge of healthcare data privacy and security requirements Benefits
Health & Wellness: Comprehensive health, dental, and vision insurance Mental health support and wellness coaching Flexible wellness stipend for fitness, therapy, or personal growth Daily catered lunch and dinner Growth & Development: Annual learning budget for courses, books, or conferences Conference attendance budget for professional development Development setup of your choice Academic collaboration opportunities Compensation Range: $220K - $260K
#J-18808-Ljbffr
We build AI that puts people first. Instead of just making organizations more efficient, we ensure AI systems actually help humans thrive. We focus on healthcare because getting things right matters most in this field. Our technology provides organizations with confidence that their AI is functioning correctly before deployment through advanced testing and verification. We operate globally and follow strict regulations. Our team comes from diverse backgrounds to create AI that organizations can trust and that genuinely benefits people. About this role As a Staff Software Engineer (Observability) at Amigo, you\'ll build the monitoring, logging, and debugging infrastructure that ensures our AI agents operate reliably and transparently. You\'ll design systems that provide visibility into our platform\'s behavior, enabling our team to maintain reliability and quickly diagnose issues that arise. What you\'ll do
Design and implement observability infrastructure across the entire platform Build real-time monitoring systems that detect anomalies before they impact patient care Create advanced debugging tools for complex distributed systems and AI model behavior Implement distributed tracing systems that track requests across services Design alerting systems that minimize false positives while catching all critical issues Build dashboards and analytics tools that provide insights into system performance and health Implement log aggregation and analysis systems for compliance and debugging Create performance profiling tools for identifying bottlenecks in AI inference pipelines Design systems for monitoring AI model drift and behavior changes over time Build chaos engineering tools to test system resilience and failure modes What we\'re looking for
7+ years of experience building observability and monitoring systems Deep expertise with observability and distributed tracing tools Strong experience with distributed systems and service architectures Experience building monitoring for complex distributed systems and application performance Knowledge of statistical analysis and anomaly detection techniques Strong programming skills in multiple Experience with time series databases and analytics Understanding of SRE principles and practices Experience with performance profiling and optimization Strong debugging skills for complex distributed systems Nice to have
Experience in healthcare, finance, or other regulated industries Background with statistical monitoring and performance optimization Experience with compliance monitoring and audit logging Knowledge of healthcare data privacy and security requirements Benefits
Health & Wellness: Comprehensive health, dental, and vision insurance Mental health support and wellness coaching Flexible wellness stipend for fitness, therapy, or personal growth Daily catered lunch and dinner Growth & Development: Annual learning budget for courses, books, or conferences Conference attendance budget for professional development Development setup of your choice Academic collaboration opportunities Compensation Range: $220K - $260K
#J-18808-Ljbffr