About Quizlet:
At Quizlet, our mission is to help every learner achieve their outcomes in the most effective and delightful way. Our $1B+ learning platform serves tens of millions of students every month, including two-thirds of U.S. high schoolers and half of U.S. college students, powering over 2 billion learning interactions monthly.
We blend cognitive science with machine learning to personalize and enhance the learning experience for students, professionals, and lifelong learners alike. We’re energized by the potential to power more learners through multiple approaches and various tools.
Let’s Build the Future of Learning
Join us to design and deliver AI-powered learning tools that scale across the world and unlock human potential.
About the Role:
We're looking for an experienced Site Reliability Engineer to be a systems developer who engineers the software, tools, and automation required to scale our platform for the next generation of AI features.
We’re happy to share that this is an onsite position in our San Francisco office. To help foster team collaboration, we require that employees be in the office a minimum of three days per week: Monday, Wednesday, and Thursday and as needed by your manager or the company. We believe that this working environment facilitates increased work efficiency, team partnership, and supports growth as an employee and organization.
In this role, you will:
- System Resilience & Self-Healing: Develop, test, and implement high-volume connection management and auto-scaling logic (in Go) for our Kubernetes clusters. Introduce and maintain self-healing and auto-remediation capabilities into core service layers to maintain 99.95% availability under peak load
- Dev Tooling & MTTR Improvement: Architect and implement tooling and automation that measurably improves our Mean Time To Resolution (MTTR)
- Optimize our CI/CD toolchain (GitHub Actions, CircleCI, ArgoCD) by writing and hardening deployment controllers and safety checks
- Deep Observability & Diagnostics: Design and deploy instrumentation within Datadog for high-volume monitoring
- Utilize deep-dive investigations and platforms like Jeli to drive architectural improvements that measurably reduce operational toil
- Architectural Consulting & Standards: Act as a senior subject matter expert, partnering directly with Product Engineering teams to review designs, define SLOs, and ensure new services are inherently resilient and operable before production deployment
- Data Layer Performance Tuning: Conduct advanced performance analysis and capacity planning for our transactional and analytical databases (Spanner, PlanetScale, BigQuery), optimizing query execution and eliminating hot-spotting
What you bring to the table:
- Proven History of Architectural Ownership: and driving reliability initiatives within complex, distributed production environments as an SRE, DevOps, or Infrastructure Engineering role. We also welcome experienced Software Engineers (backend/systems) with a dedicated focus on reliability, performance, and operational excellence
- Core Systems Development: Proficiency in Go and/or Python, with a proven history of developing and deploying production-grade automation and control plane components for infrastructure teams. You must be experienced and comfortable providing hands‑on operational support for critical, high-volume legacy codebases (like a PHP/HHVM/Hack monolith) while driving modernization efforts
- Kubernetes Control Plane Engineering: Deep operational and architectural knowledge of Kubernetes (GKE) and the complexities of service mesh (Istio) networking in a high-volume environment.
- High-Volume Tooling: Expert experience with CI/CD systems (GitHub Actions, CircleCI, ArgoCD) and infrastructure‑as‑code tools (e.g., Terraform)
- Observability & Debugging: Mastery of high-volume monitoring and observability stacks like Datadog, and a proven ability to perform deep-dive root cause analysis under production load using systems like Jeli
- Systems Foundation: Solid understanding of Linux systems, networking, and the challenges of large-scale distributed cloud architectures on GCP (AWS or equivalent)
Compensation, Benefits & Perks:
- Quizlet is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. Salary transparency helps to mitigate unfair hiring practices when it comes to discrimination and pay gaps. Total compensation for this role is market competitive, including a starting base salary of $110,000 - $230,000, depending on location and experience, as well as company stock options
- Collaborate with your manager and team to create a healthy work‑life balance
- 20 vacation days that we expect you to take!
- Competitive health, dental, and vision insurance (100% employee and 75% dependent PPO, Dental, VSP Choice)
- Employer‑sponsored 401(k) plan with company match
- Access to LinkedIn Learning and other resources to support professional growth
- Paid Family Leave, FSA, HSA, Commuter benefits, and Wellness benefits
- 40 hours of annual paid time off to participate in volunteer programs of choice
- ate in the Quizlet stock option program
Why Join Quizlet?
Massive reach: 60M+ users, 1B+ interactions per week