Logo
LMArena

DevOps Engineer, Site Reliability Engineering (SRE)

LMArena, San Francisco, California, United States, 94199

Save Job

Position Overview

We are seeking an experienced, security-minded Site Reliability Engineer to own and elevate our infrastructure, processes, and operational security. You will: Take end-to-end ownership of infrastructure operations across Cloudflare, Vercel, and our CI/CD pipelines.

Embed security best practices into every layer of the stack, ensuring resilience against emerging threats.

Establish processes and procedures that promote efficient onboarding and ramping up new team members, and mentor incoming and more junior members of the team.

This role is ideal for a seasoned SRE who thrives at the intersection of reliability, performance, and security, and who brings the rigor needed to keep fast-moving product teams focused on innovation. Key Responsibilities

Infrastructure as Code Manage Terraform modules and secrets pipelines; champion immutable, auditable infrastructure.

Cloudflare Operations Configure, monitor, and harden WAF, DDoS protections, bot management, and CDN caching strategies.

Vercel & Edge Runtime Own deployment architecture, performance tuning, and incident response for our Next.js-based front end and Edge Functions.

CI/CD & Release Engineering Design, implement, and maintain secure pipelines (GitHub Actions, Vercel integrations) with automated testing and vulnerability scanning.

Change Management & Documentation Establish and enforce a lightweight but disciplined RFC/change-control process; maintain comprehensive runbooks and architecture diagrams.

Observability & Incident Response Expand monitoring, logging, and alerting; lead post-incident reviews and drive continual improvement.

Mentorship Provide day-to-day guidance to engineers and junior SREs, fostering a culture of ownership and learning.

Compliance Support Partner with ProdSec and GRC teams on SOC 2, ISO 27001, and customer security questionnaires.

Manage and maintain internal and external facing infrastructure

Maintain and configure log aggregation requirements, and the infrastructure used to store them across the business

Required Qualifications

7+ years in SRE/DevOps roles for high-traffic SaaS or consumer web products.

Proven expertise securing and scaling Cloudflare and Vercel (or comparable CDN/edge and serverless platforms).

Deep understanding of web application security, networking, TLS, and zero-trust principles.

Strong proficiency with infrastructure as code (Terraform, Pulumi, or similar), and serverless build pipelines (GitHub Actions or similar)

Strong programming abilities (Golang, Python, TypeScript) and scripting

Demonstrated success designing and enforcing change-management workflows.

Excellent written communicationable to produce clear runbooks and architecture docs.

Track record mentoring or leading junior engineers.

Nice-to-Have

Experience with container orchestration (Kubernetes or Nomad).

Experience with serverless stacks.

Certifications such as AWS/GCP Professional, GIAC-GCSA, CKS, or CISSP.

Why You'll Love Working Here

Impact You'll set the foundation for reliability and security across a rapidly growing AI benchmarking platform.

Culture Engineering-first, documentation-driven, and community-obsessed.

Compensation Competitive salary, meaningful equity, comprehensive benefits, and professional-development budget.