Postman
Member of Technical Staff, AI Reliability & Monitoring Engineering Lead
Postman, San Francisco, California, United States, 94199
Overview
Member of Technical Staff, AI Reliability & Monitoring Engineering Lead — Postman Join to apply for the Member of Technical Staff, AI Reliability & Monitoring Engineering Lead role at Postman. What You’ll Do
Develop and manage reliability metrics (SLOs) for AI-driven API services and agentic AI platform features Implement comprehensive observability and monitoring systems for real-time performance and fault detection Design and drive automated failover, recovery, and incident response strategies for high-availability AI infrastructure Optimize resource utilization, particularly GPU/accelerator efficiency, ensuring cost-effective AI system operation Collaborate closely with engineering, platform, and product teams to align reliability efforts with broader organizational goals Lead efforts to build internal tooling and automation focused on AI system stability and operational excellence Drive continuous improvement in deployment practices, monitoring approaches, and incident management processes About You
Have a strong background in AI reliability engineering, SRE, or DevOps for distributed systems Understand the unique challenges of maintaining large-scale AI systems and integrating AI-specific metrics into reliability frameworks Are experienced with cloud platforms, monitoring tools, and incident response automation Are comfortable collaborating across teams to influence best practices for AI system reliability and operational health Thrive in dynamic, fast-paced environments focusing on delivering reliable, safe AI-powered services Bonus Skills And Experiences
Hands-on experience with AI/ML infrastructure, including GPU/xPU optimization and scaling Familiarity with API platform operations and large-scale distributed services Prior experience building or operating observability tools tailored for AI and agentic systems Contribution to open-source projects or reliability engineering thought leadership Compensation
The reasonably estimated base salary for this role ranges from $256,000 to $276,000, plus a competitive equity package. Actual compensation is based on the candidate's skills, qualifications, and experience. What Else
In addition to Postman's pay-for-performance philosophy and a flexible schedule, Postman offers a comprehensive benefits package including full medical coverage, flexible PTO, wellness reimbursement, and a monthly lunch stipend. Our wellness programs support physical and mental health, and team-building events help maintain connection. We offer a donation-matching program and strive for an inclusive culture where everyone can be their best. At Postman, we embrace a hybrid work model. For roles based in the San Francisco Bay Area, Boston, Bangalore, Hyderabad, and New York, employees are expected to come to the office 3 days a week. Equal Opportunity
Postman is an Equal Employment Opportunity and Affirmative Action Employer. Qualified applicants will receive consideration without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, marital status, protected veteran status, or disability status. Company Details
Seniority level: Mid-Senior level Employment type: Full-time Job function: Engineering and Information Technology
#J-18808-Ljbffr
Member of Technical Staff, AI Reliability & Monitoring Engineering Lead — Postman Join to apply for the Member of Technical Staff, AI Reliability & Monitoring Engineering Lead role at Postman. What You’ll Do
Develop and manage reliability metrics (SLOs) for AI-driven API services and agentic AI platform features Implement comprehensive observability and monitoring systems for real-time performance and fault detection Design and drive automated failover, recovery, and incident response strategies for high-availability AI infrastructure Optimize resource utilization, particularly GPU/accelerator efficiency, ensuring cost-effective AI system operation Collaborate closely with engineering, platform, and product teams to align reliability efforts with broader organizational goals Lead efforts to build internal tooling and automation focused on AI system stability and operational excellence Drive continuous improvement in deployment practices, monitoring approaches, and incident management processes About You
Have a strong background in AI reliability engineering, SRE, or DevOps for distributed systems Understand the unique challenges of maintaining large-scale AI systems and integrating AI-specific metrics into reliability frameworks Are experienced with cloud platforms, monitoring tools, and incident response automation Are comfortable collaborating across teams to influence best practices for AI system reliability and operational health Thrive in dynamic, fast-paced environments focusing on delivering reliable, safe AI-powered services Bonus Skills And Experiences
Hands-on experience with AI/ML infrastructure, including GPU/xPU optimization and scaling Familiarity with API platform operations and large-scale distributed services Prior experience building or operating observability tools tailored for AI and agentic systems Contribution to open-source projects or reliability engineering thought leadership Compensation
The reasonably estimated base salary for this role ranges from $256,000 to $276,000, plus a competitive equity package. Actual compensation is based on the candidate's skills, qualifications, and experience. What Else
In addition to Postman's pay-for-performance philosophy and a flexible schedule, Postman offers a comprehensive benefits package including full medical coverage, flexible PTO, wellness reimbursement, and a monthly lunch stipend. Our wellness programs support physical and mental health, and team-building events help maintain connection. We offer a donation-matching program and strive for an inclusive culture where everyone can be their best. At Postman, we embrace a hybrid work model. For roles based in the San Francisco Bay Area, Boston, Bangalore, Hyderabad, and New York, employees are expected to come to the office 3 days a week. Equal Opportunity
Postman is an Equal Employment Opportunity and Affirmative Action Employer. Qualified applicants will receive consideration without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, marital status, protected veteran status, or disability status. Company Details
Seniority level: Mid-Senior level Employment type: Full-time Job function: Engineering and Information Technology
#J-18808-Ljbffr