Logo
Lambda

Senior Site Reliability Engineer - Observability

Lambda, Seattle, Washington, us, 98127

Save Job

Senior Site Reliability Engineer – Observability Join Lambda’s AI cloud mission as a Senior Site Reliability Engineer focused on Observability. This role requires onsite presence in the San Francisco office four days a week, with a remote work day on Tuesday.

Base Pay Range $240,000 – $401,000 per year

What You’ll Do

Deploy and operate observability platforms for logging, metrics, and distributed tracing.

Automate the deployment and operation of these systems.

Set up monitoring for modern AI/HPC clusters.

Develop platform software that improves system reliability across Lambda engineering.

Lead other engineering teams to design and develop solutions for monitoring challenges.

Qualifications

8+ years software engineering, 3+ years in Go.

5+ years SRE practices.

Proven understanding of observability tools and practices.

Experience with Kubernetes deployment and monitoring.

Experience building CI/CD pipelines.

Expect quality and reliability from the solutions you build.

Collaborate across team boundaries to meet observability needs.

Nice to Have

Monitoring AI systems or HPC clusters.

Prometheus and PromQL queries.

Messaging systems like NATS.

OpenTelemetry ecosystem experience.

Network monitoring, Ethernet, Infiniband.

Dashboard design principles.

Linux fundamentals and system administration.

Ansible, Terraform infrastructure automation.

Benefits

Generous cash & equity compensation.

Health, dental, vision coverage.

Wellness and commuter stipends.

401(k) with 2% company match (USA).

Flexible paid time off.

Equal Opportunity Employer Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.

A Final Note You do not need to meet all of the listed expectations to apply for this position. Lambda is committed to building a team with a variety of backgrounds, experiences, and skills.

#J-18808-Ljbffr