Logo
TikTok

Production System Engineer - San Jose

TikTok, San Jose, California, United States, 95199

Save Job

Get AI‑powered advice on this job and more exclusive features.

Responsibilities The Data Systems Infrastructure (DSI) team stands as the unseen architects behind the scenes. In a thrilling dance of technology and innovation, we propel the company's meteoric rise by constructing and orchestrating colossal data fortresses, taming the life cycle of server fleets, conjuring Cloud solutions, and crafting a symphony of infrastructure services. Our mission is to ensure scalability and unwavering reliability, making sure ByteDance's digital footprint leaves an indelible mark on the world.

Embark on an exciting expedition to explore the rapidly expanding ByteDance domain in the United States, Europe, and Asia. Here, the Data Systems Infrastructure (DSI) team is crafting monumental data citadels that encircle the planet, sheltering legions of hundreds of thousands of servers. As the maestro of our production systems, you will embark on a captivating odyssey, taming the life cycles of these servers. Your adventure will begin with the orchestration of their initial deployment, navigating the intricate terrain of OS installation, summoning services like a digital magician, and maintaining vigilant watch over our inventory. But, like any epic tale, there will be times of challenge when you become a troubleshooter extraordinaire, mending and restoring with unwavering dedication. Eventually, you'll guide them into the sunset, orchestrating their decommissioning and ensuring their rebirth through recycling, all while contributing to the pulsating rhythm of ByteDance's technological evolution.

Operation: Contribute to enhancing the stability, efficiency, effectiveness, and scalability of our data center and server operations, platform, and service on a worldwide scale.

Lifecycle Enhancement: Participate in and enhance the entire lifecycle of the server fleet – from system design and introduction consultation to launch reviews, deployment, operation, and retirement.

Automation: Develop and deploy tools and solutions to enhance the automation, reliability, scalability, and operability of servers in the datacenter.

Monitoring: Develop and deploy tools and solutions for improving the availability, latency, and overall service of the datacenter infrastructure, server, and network health.

Disaster Recovery: Troubleshoot and resolve complex technical issues in a high‑pressure, fast‑paced environment. Conduct high‑level root‑cause analysis for service interruption and establish preventive measures. Practice sustainable incident response and post‑mortem.

Cross‑team Collaboration: Collaborate with stakeholders such as infrastructure architects, project managers, data center operations engineers, platform developers, supply chain teams, and internal customers to comprehend overarching business objectives. Additionally, design and implement innovative solutions for our Core IDCs and CDN/Edge.

On‑call: Engage in our on‑call support spanning across regions and incident response teams to address critical issues in the production environment.

Qualifications

Minimum Qualifications

Education: Bachelor’s degree in Computer Science, Electronic Engineering, relevant technical field, or equivalent practical experience.

Experience in at least one of the areas below:

Server Operations : Proficiency in Linux system administration, deep understanding of kernels, drivers, and modules, scripting in Bash and Python for routine operations, performance tuning, security management, hardware troubleshooting, and participation in large‑scale data center planning and operation.

Tooling Adaptation, Deployment, and Maintenance : Customizing operation and maintenance tools for new server hardware, managing the software tool lifecycle, facilitating monitoring of server performance, provisioning resources, fault management, and developing/maintaining monitoring software for >10,000 servers.

Communication : Experience managing and coordinating teams in a global context.

Preferred Qualifications

Three years of related work experience.

Intermediate level expertise in data center operations, OS installations, break‑fix, planning and operations, renovation activities.

Proficiency in operation and maintenance of GPU servers.

Full Stack Software Development : Ability to create and integrate RESTful APIs using Flask, proficiency in JavaScript, Node.js for front‑end and back‑end, SQL database design and queries, Redis familiarity, Ansible configuration management, application deployment, and task execution.

About TikTok TikTok is the leading destination for short‑form mobile video. At TikTok, our mission is to inspire creativity and bring joy. TikTok's global headquarters are in Los Angeles and Singapore, and we also have offices in New York City, London, Dublin, Paris, Berlin, Dubai, Jakarta, Seoul, and Tokyo.

Why Join Us Inspiring creativity is at the core of TikTok's mission. Our innovative product is built to help people authentically express themselves, discover and connect – and our global, diverse teams make that possible. Together, we create value for our communities, inspire creativity, and bring joy. We strive to do great things with great people, leading with curiosity, humility, and a desire to make an impact in a rapidly growing tech company. Every challenge is an opportunity to learn and innovate as one team. We are resilient and embrace challenges. By constantly iterating and fostering an "Always Day 1" mindset, we achieve meaningful breakthroughs for ourselves, our company, and our users. Join us.

Diversity & Inclusion TikTok is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe and so does our workplace. To achieve that goal, we are committed to celebrating our diverse voices and to creating an environment that reflects the many communities we reach.

TikTok Accommodation TikTok is committed to providing reasonable accommodations in our recruitment processes for candidates with disabilities, pregnancy, sincerely held religious beliefs or other reasons protected by applicable laws. If you need assistance or a reasonable accommodation, please reach out to us at

https://tinyurl.com/RA-request .

Job Information Compensation : The base salary range for this position in the selected city is $87,480 – $228,000 annually. Compensation may vary outside of this range depending on a number of factors, including a candidate’s qualifications, skills, competencies and experience, and location. Base pay is one part of the Total Package that is provided to compensate and recognize employees for their work, and this role may be eligible for additional discretionary bonuses/incentives and restricted stock units.

Benefits : Employees have day one access to medical, dental, and vision insurance, a 401(k) savings plan with company match, paid parental leave, short‑term and long‑term disability coverage, life insurance, wellbeing benefits, among others. Employees also receive 10 paid holidays per year, 10 paid sick days per year and 17 days of Paid Personal Time (prorated upon hire with increasing accruals by tenure). The Company reserves the right to modify or change these benefits programs at any time, with or without notice.

Fair Chance Hiring

(Los Angeles County): Qualified applicants with arrest or conviction records will be considered for employment in accordance with all federal, state, and local laws including the Los Angeles County Fair Chance Ordinance for Employers and the California Fair Chance Act. Consideration may be impacted by the following duties: (1) interacting with clients or colleagues, (2) handling confidential information, and (3) exercising sound judgment.

#J-18808-Ljbffr