Logo
Litmus7

SRE Manager

Litmus7, San Ramon, California, United States, 94583

Save Job

Join to apply for the

SRE Manager

role at

Litmus7 Overview Overview

A Site Reliability Engineer role focused on monitoring, protecting customer applications, and leading operational tasks to ensure reliable, available, and performant systems. This position involves working with Litmus7 leadership and customer teams to deliver 24x7 production support in an e-commerce environment. Responsibilities

Monitor, automate, and improve the reliability, performance, and availability of applications and services. Lead SRE activities at customer sites and be the Litmus7 representative on-site, collaborating with Litmus7 leadership. Provide Production Application Support and coordinate with offshore teams (IND) for 24x7 coverage during India night hours. Gather and communicate SRE requirements from both technical and non-technical stakeholders; define health metrics and service level expectations. Collect requirements on health of applications and services to monitor, and establish appropriate service levels. Demonstrate knowledge of Level 1, Level 2, and Level 3 support in e-commerce platforms (e.g., Shopify, Blue Yonder, or similar). Hands-on experience with Monitoring, Logging, Alerting, Dashboarding, and reporting in tools such as AppDynamics, Splunk, Dynatrace, Datadog, CloudWatch, ELK, Prometheus, New Relic, etc. Work with customers on tools like NewRelic and PagerDuty; apply SRE principles including logs, metrics, availability, incidents, change management, production deployments, risk mitigation, and SLAs/SLIs. Lead P1 incident communications, coordinate with customers, and drive RCA with leads and stakeholders. Create and maintain SOPs and runbooks; handle ITSM platforms (JIRA, ServiceNow, BMC Remedy). Collaborate with Dev teams and cross-functional teams across time zones. Generate workload summaries (WSR/MSR) by extracting tickets from ITSM platforms and present findings to customers and L7 leaders. Qualifications

Mandatory: experience as an SRE Lead or in an SRE/Techno-functional role at a customer location in an e-commerce/retail domain. Strong knowledge of Production Application Support and 24x7 operational coverage. Experience interacting with offshore teams and coordinating across time zones. Ability to gather and communicate requirements for system health, availability metrics, and service levels. Good knowledge of Level 1, 2, and 3 support in e-commerce platforms. Proficiency with monitoring, logging, alerting, dashboards, and reporting tools (e.g., AppDynamics, Splunk, Dynatrace, Datadog, CloudWatch, ELK, Prometheus, New Relic). Familiarity with ITSM platforms (JIRA, ServiceNow, BMC Remedy) and ability to develop SOPs/runbooks. Experience with postman and API testing is a plus. Strong communication skills and ability to explain technical concepts to non-technical stakeholders. Ability to work in a fast-paced, evolving environment; collaborate with cloud, infrastructure, project management, and management teams. Capable of creating detailed documentation and operating with a CAN DO attitude. Non-Technical Requirements

Clear communication and understanding of technical ideas; professional demeanor when interacting with peers and stakeholders. Ability to work with teams in different time zones and collaborate across functions. Excellent written and verbal communication; motivated, goal-driven, innovative, and collaborative. Note: This job description includes the core responsibilities and qualifications for the SRE Manager role at Litmus7. Other duties may be assigned as needed.

#J-18808-Ljbffr