Logo
Coupang

Sr. Staff Site Reliability Engineer

Coupang, Seattle, Washington, us, 98127

Save Job

Job Overview Site Reliability Engineers (SREs) at Coupang are a mission‑critical role that combines software and system engineering to build, run and scale our complex, large‑scale e‑commerce systems. As part of the Site Reliability Engineering team you will be responsible for ensuring all customer‑facing services are healthy, monitored, automated, and designed to scale. We take pride in handling the "operations as an engineering" problem with an automation‑first approach. You will build best‑in‑class infrastructure automation for Observability, Incident Management, Disaster Recovery, Load Testing, Capacity Engineering, and more. You will work closely with product development teams from early design through to production incidents, maintain SLI/SLA bars, and influence design with SRE principles and best practices. If you take pride in ownership, enjoy solving complex technical challenges for large‑scale distributed systems, and communicate effectively across boundaries, this is the role for you.

Key Responsibilities

Serve as the primary point responsible for the platform reliability, health, and performance of all Coupang customer‑facing services.

Gain deep knowledge of Coupang application workflows and dependencies.

Define and track KPIs and SLOs related to system availability, performance, and reliability.

Build world‑class incident management processes and automation, including fast incident remediation, operational reviews, and retrospectives.

Develop and implement best practices for creating, scaling, and maintaining effective monitoring, alerting, and telemetry systems.

Build automation to execute regular Disaster Recovery, Chaos, and load testing to stay ahead of growth.

Work closely with product teams to ensure designs incorporate scale and operability.

Build guardrails and automation for deploying production changes while holding the reliability bar.

Participate in a 24x7 rotation for production issue escalations, functioning well in a fast‑paced environment.

Communicate effectively with stakeholders at all levels of the organization.

Basic Qualifications

Bachelor's degree in computer science, engineering, or a related technical field.

8+ years of industry experience building and operating large‑scale distributed systems.

Preferred Qualifications

Prior experience with AI/ML, large‑scale web‑based Java architectures, and JVM configuration.

Professional certifications in cloud platforms, monitoring tools, or related technologies.

Previous experience working on a large‑scale GPU/Cloud Infrastructure platform.

SLO/SLA management and implementation experience.

Deep UNIX/Linux systems knowledge and administration background.

Demonstrated programming skills in Python, Java, Golang, or Ruby.

Strong problem‑solving and analytical skills across systems, network (TCP/IP), and code.

Experience with cloud‑based GPU infrastructure (AWS, Azure, or GCP).

Strong understanding of DevOps and SRE practices, including CI/CD and IaC.

Experience with containerization and orchestration technologies such as Docker and Kubernetes.

Excellent communication and collaboration skills, with the ability to work across distinct technical domains.

Knowledge of the open telemetry observability ecosystem, including metrics, logging, tracing, and tools such as Prometheus, Grafana, Elastic Stack, Datadog, or New Relic.

Pay & Benefits Our compensation reflects the cost of labor across several U.S. geographic markets. At Coupang, your base pay is one part of your total compensation. The base pay for this position ranges from $176,000 per year in our lowest geographic market to $221,000 per year in our highest geographic market. Pay is based on several factors including market location and may vary depending on job‑related knowledge, skills, and experience.

General Description of All Benefits

Medical/Dental/Vision/Life, AD&D insurance

Flexible Spending Accounts (FSA) and Health Savings Account (HSA)

Long‑term/Short‑term Disability

Employee Assistance Program (EAP)

401(k) plan with company match

18‑21 days of paid time off (PTO) a year based on tenure

12 public holidays

Paid parental leave

Pre‑tax commuter benefits

Electric car charging station (MTV – Free)

General Description of Other Compensation Other compensation includes, but is not limited to, bonuses, equity, or other forms of compensation that would be offered to the hired applicant in addition to their established salary range or wage scale.

Equal Opportunities for All Coupang is an equal‑opportunity employer. All qualified applicants will receive consideration for employment without regard to actual or perceived race, color, religion, gender, sexual orientation, ancestry, national origin, age, disability, medical condition, genetic information, military or veteran status, or other protected characteristics. Coupang is also committed to providing a safe work environment for its employees and consumers. If you need assistance and/or a reasonable accommodation in the recruiting process due to a disability, please contact us at usrecruiting@coupang.com. Requisition: R0065794.

Coupang is an equal‑opportunity employer. Our unprecedented success could not be possible without the valuable inputs of our globally diverse team.

#J-18808-Ljbffr