Sierra Business Solution

Software Engineer, Site Reliability (SRE)

Sierra Business Solution, San Francisco, California, United States, 94199

Software Engineer, Site Reliability (SRE) Software Engineer, Site Reliability (SRE)

at

Sierra Business Solution .

About Us

We are an in‑person company based in San Francisco with growing offices in Atlanta, New York, and London, building a platform that helps businesses create better, more human customer experiences with AI.

Our core values are Trust, Customer Obsession, Craftsmanship, Intensity, and Family.

Company founders: Bret Taylor, former Salesforce and Facebook executive; Clay Bavor, former Google Labs leader.

What You’ll Do

Own Sierra’s observability stack—monitoring, alerting, logging, and tracing—to give engineers clear visibility into system health and performance.

Partner with product and platform engineers to design reliable, scalable systems from day one.

Design and implement scalable, secure cloud infrastructure (AWS) using Terraform and modern DevOps tooling.

Improve reliability and scalability of LLM deployments, ensuring robust, cost‑effective operation.

Lead improvements to deployment pipelines, CI/CD tooling, and incident‑management processes.

Define the foundation of SRE practices at Sierra, influencing culture, tooling, and best practices.

What You’ll Bring

5+ years of hands‑on experience in Site Reliability or infrastructure engineering for complex SaaS or cloud‑based systems.

Experience designing for availability, scalability, and reliability at both infrastructure and application layers.

Deep experience with Terraform, AWS services, container orchestration, and cloud networking (IAM, VPC).

Strong background in observability systems (Prometheus, Grafana, Datadog, or similar).

Experience working with enterprise customers and familiarity with compliance and networking needs.

Comfortable working in fast‑moving environments and collaborating across teams.

Degree in Computer Science or equivalent professional experience.

Even Better

Experience with LLM infrastructure—optimizing inference, managing fine‑tuned models, or large‑scale deployment.

Early‑stage startup experience defining SRE culture and tooling from scratch.

Familiarity with incident‑management automation or self‑healing infrastructure patterns.

Benefits

Unlimited Paid Time Off

Medical, Dental, and Vision benefits

Life Insurance and Disability Benefits

401(k) retirement plan with company match

Parental Leave and fertility benefits via Carrot

Lunch, snacks, coffee, and discretionary stipend

Equity plans per applicable policies

Equality & Diversity We actively encourage applicants of all backgrounds to apply. We strive to evaluate all applicants consistently without regard to race, color, religion, gender, sexual orientation, age, disability, veteran status, or any other protected characteristic.

#J-18808-Ljbffr