Logo
TikTok

Site Reliability Engineer, Compute - USDS

TikTok, WorkFromHome

Save Job

Overview

Site Reliability Engineer, Compute - USDS

1 week ago Be among the first 25 applicants

Responsibilities

Site Reliability Engineering (SRE) at TikTok combines software and systems engineering to build and run large-scale, massively distributed, and fault-tolerant systems. In our team, you’ll have the opportunity to manage the complex challenges of scale, while using expertise in coding, algorithms, complexity analysis, and large-scale system design. We embrace a culture of diversity, intellectual curiosity, openness, and problem-solving. We encourage close collaboration while promoting self-direction.

In order to enhance collaboration and cross-functional partnerships, among other things, at this time, our organization follows a hybrid work schedule that requires employees to work in the office 3 days a week, or as directed by their manager/department. We regularly review our hybrid work model, and the specific requirements may change at any time.

  • Develop and maintain automation procedures to maximize system efficiency and minimize human intervention.
  • Work closely with software engineering teams to design, deploy and operate elements to ensure that systems are functionally robust.
  • Ensure system scalability to handle growth in web traffic and data.
  • Implement monitoring tools and set up metrics to keep track of system health and performance.
  • Participate in on-call rotations, assist with incident management, and diagnose, resolve, and prevent production issues.
  • Conduct performance tests to find and address system bottlenecks.
  • Collaborate with teams across the organization to define Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Service Level Agreements (SLAs).
  • Practice sustainable user support, incident response, and blameless postmortems.

Qualifications

Minimum Qualification:

  • Bachelor's degree in Computer Science, Information Technology, or a related field with 3+ years of experience
  • Proven work experience as a Site Reliability Engineer, Systems Engineer, or similar software engineering role.
  • Passionate about operational excellence through methodical automation and engineering processes using programming languages such as Go, Python and/or any other languages.
  • Experience in network architecture, database modeling, cloud systems and large-scale distributed systems.
  • Strong understanding of Linux operating systems and open-source technologies.
  • Excellent problem-solving skills, strategic thinking, and a strong ability to debug complex systems.
  • Exceptional communication skills and the ability to effectively collaborate with cross-functional teams.

Preferred Qualification:

  • Knowledge of monitoring tools and methodologies (such as Prometheus, Grafana).
  • Experience with containers and container orchestration platforms such as Docker, Kubernetes or equivalent.

Legal and Employment Details

Candidates for this position must be legally authorized to work in the United States. This position is not eligible for visa sponsorship or support.

About USDS

TikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy. U.S. Data Security (“USDS”) is a subsidiary of TikTok in the U.S. This new, security-first division was created to bring heightened focus and governance to our data protection policies and content assurance protocols to keep U.S. users safe. Our focus is on providing oversight and protection of the TikTok platform and U.S. user data, so millions of Americans can continue turning to TikTok to learn something new, earn a living, express themselves creatively, or be entertained. The teams within USDS that deliver on this commitment daily span across Trust & Safety, Security & Privacy, Engineering, User & Product Ops, Corporate Functions and more.

Data Security Statement: This role requires the ability to work with and support systems designed to protect sensitive data and information. As such, this role will be subject to strict national security-related screening.

Compensation and Benefits

Job Information
For Pay Transparency: Compensation Description (Annually)

The base salary range for this position in the selected city is $136,800 - $259,200 annually. Compensation may vary outside of this range depending on a number of factors, including a candidate’s qualifications, skills, competencies and experience, and location. Base pay is one part of the Total Package that is provided to compensate and recognize employees for their work, and this role may be eligible for additional discretionary bonuses/incentives, and restricted stock units. Benefits may vary depending on the nature of employment and the country work location. Employees have day one access to medical, dental, and vision insurance, a 401(k) savings plan with company match, paid parental leave, short-term and long-term disability coverage, life insurance, wellbeing benefits, among others. Employees also receive 10 paid holidays per year, 10 paid sick days per year and 17 days of Paid Personal Time (prorated upon hire with increasing accruals by tenure). The Company reserves the right to modify or change these benefits programs at any time, with or without notice.

Location and Roles

Los Angeles County and others may have applicable job rules and norms; refer to the posting for location details and eligibility.

This page may include additional postings of related roles (e.g., Netflix, SAP) for context, but responsibilities listed here pertain to the Site Reliability Engineer role at TikTok/USDS.

Note: This job description contains information about other roles listed on the page; only the listed responsibilities and qualifications above apply to the Site Reliability Engineer, Compute - USDS position.

#J-18808-Ljbffr