Logo
Bayone

Lead Site Reliability Engineer

Bayone, San Ramon

Save Job

Job Description:
  • As a Senior/Lead Site Reliability Engineer, you'll take ownership of the reliability, performance, and scalability of high-traffic retail platforms.
  • This role demands deep experience in cloud-native environments, a strong observability mindset (with New Relic as a must), and the ability to lead both incident response and system design discussions with client teams.
  • You'll serve as a technical leader and mentor, partnering with engineering, DevOps, and product teams to build resilient systems for real-time retail operations-including eCommerce platforms like Shopify (bonus).
Key Responsibilities:
  • Lead reliability and observability strategy for large-scale retail systems.
  • Architect and implement robust monitoring using New Relic-dashboards, SLOs, alerts, synthetic monitoring, etc.
  • Guide incident response processes and run blameless postmortems.
  • Own availability, performance, and scalability of customer-facing apps and services.
  • Design infrastructure for high availability using Kubernetes, Docker, and IAC tools (Terraform, CloudFormation).
  • Collaborate with client engineering teams to optimize system behavior during retail surges (e.g., Black Friday).
  • Mentor junior SREs and set operational best practices.
  • Partner with dev and QA to integrate performance testing and failure injection into CI/CD workflows.
  • Advocate for DevOps/SRE best practices (shift-left monitoring, chaos testing, performance budgets).
Required Qualifications:
  • 8+ years in Site Reliability Engineering, DevOps, or Platform Engineering.
  • Expertise with New Relic-must be able to architect observability end-to-end.
  • Proven experience supporting retail or eCommerce platforms at scale.
  • Strong coding/scripting (Python, Bash, or Go).
  • Production experience with AWS/GCP/Azure and Kubernetes.
  • Deep understanding of infrastructure automation (Terraform, Ansible, or Pulumi).
  • Strong communication skills, client-facing presence, and leadership ability.

Nice to Have:
  • Experience with Shopify or headless commerce stacks.
  • Experience leading distributed teams.
  • Familiarity with traffic-heavy retail events and strategies (caching, autoscaling, edge optimization).
  • Experience integrating monitoring into microservices, APIs, and frontend apps