TikTok
Site Reliability Engineer, Infrastructure and Assurance Services - USDS
TikTok, Seattle, Washington, us, 98127
Site Reliability Engineer, Infrastructure and Assurance Services - USDS
As a member of the Systems and Networking team, you will ensure the seamless operation of TikTok’s U.S. physical infrastructure. Your primary responsibilities include provisioning physical servers, maintaining the U.S. physical network, collaborating with vendors such as OCI and Akamai, and supporting internal platforms that enable daily operations across products, e‑commerce, ads, and monetization.
Responsibilities
Design, develop, and maintain infrastructure automation solutions for efficient global operations and comprehensive monitoring.
Partner with engineering teams to design, deploy, operate, and continuously improve scalable systems and services throughout the service lifecycle.
Proactively monitor system health, conduct performance testing, and manage incidents to maximize uptime and adherence to defined SLAs/SLOs.
Perform on‑call duties and production operations, including change management, capacity planning, disaster recovery, and process improvements across teams.
Qualifications Minimum Qualifications:
Bachelor’s degree in Computer Science, related field, or equivalent practical experience.
Demonstrated experience in software development with one or more programming languages.
Experience in Linux OS, networking, database concepts, monitoring, and shell scripting.
Superb analytical, problem‑solving, and critical‑thinking skills.
Excellent communicator, team‑player, self‑starter, and fast learner.
Preferred Qualifications:
Master’s degree in Computer Science, Engineering, or related field.
Proficiency in Python, GoLang, or C++.
Expertise in SRE philosophy, AIOps, APM, or disaster recovery.
Expertise with Kubernetes, ElasticSearch, ClickHouse, Message Queue, OpenTSDB, or Service Mesh.
Legal and Diversity Statements As a condition of employment, all successful candidates must be able to establish authorization to work in the United States. For this position, the Company does not provide sponsorship for any immigration‑related benefits.
USDS is committed to providing reasonable accommodations in our recruitment processes for candidates with disabilities, pregnancy, sincerely held religious beliefs, or other protected reasons. If you need assistance or a reasonable accommodation, please reach out to us at https://tinyurl.com/USDS-RA.
This role requires the ability to work with and support systems designed to protect sensitive data and information. As such, this role will be subject to strict national security‑related screening.
We strive to do great things with great people. We lead with curiosity, humility, and a desire to make impact in a rapidly growing tech company. Every challenge is an opportunity to learn and innovate as one team. We’re resilient and embrace challenges as they come. By constantly iterating and fostering an "Always Day 1" mindset, we achieve meaningful breakthroughs for ourselves, our company, and our users. Join us.
#J-18808-Ljbffr
Responsibilities
Design, develop, and maintain infrastructure automation solutions for efficient global operations and comprehensive monitoring.
Partner with engineering teams to design, deploy, operate, and continuously improve scalable systems and services throughout the service lifecycle.
Proactively monitor system health, conduct performance testing, and manage incidents to maximize uptime and adherence to defined SLAs/SLOs.
Perform on‑call duties and production operations, including change management, capacity planning, disaster recovery, and process improvements across teams.
Qualifications Minimum Qualifications:
Bachelor’s degree in Computer Science, related field, or equivalent practical experience.
Demonstrated experience in software development with one or more programming languages.
Experience in Linux OS, networking, database concepts, monitoring, and shell scripting.
Superb analytical, problem‑solving, and critical‑thinking skills.
Excellent communicator, team‑player, self‑starter, and fast learner.
Preferred Qualifications:
Master’s degree in Computer Science, Engineering, or related field.
Proficiency in Python, GoLang, or C++.
Expertise in SRE philosophy, AIOps, APM, or disaster recovery.
Expertise with Kubernetes, ElasticSearch, ClickHouse, Message Queue, OpenTSDB, or Service Mesh.
Legal and Diversity Statements As a condition of employment, all successful candidates must be able to establish authorization to work in the United States. For this position, the Company does not provide sponsorship for any immigration‑related benefits.
USDS is committed to providing reasonable accommodations in our recruitment processes for candidates with disabilities, pregnancy, sincerely held religious beliefs, or other protected reasons. If you need assistance or a reasonable accommodation, please reach out to us at https://tinyurl.com/USDS-RA.
This role requires the ability to work with and support systems designed to protect sensitive data and information. As such, this role will be subject to strict national security‑related screening.
We strive to do great things with great people. We lead with curiosity, humility, and a desire to make impact in a rapidly growing tech company. Every challenge is an opportunity to learn and innovate as one team. We’re resilient and embrace challenges as they come. By constantly iterating and fostering an "Always Day 1" mindset, we achieve meaningful breakthroughs for ourselves, our company, and our users. Join us.
#J-18808-Ljbffr