ByteDance
Software Engineer in ML Systems Graduate (AML - Machine Learning Systems) - 2026
ByteDance, Seattle, Washington, us, 98127
Software Engineer in ML Systems Graduate (AML - Machine Learning Systems) - 2026 Start (BS/MS)
Join to apply for the
Software Engineer in ML Systems Graduate (AML - Machine Learning Systems) - 2026 Start (BS/MS)
role at
ByteDance Get AI-powered advice on this job and more exclusive features. Overview
The AML Machine Learning Systems team provides end-to-end machine learning experience and resources for the company. The team builds heterogeneous ML training and inference systems based on GPU and AI chips and advances the state-of-the-art of ML systems technology to accelerate models such as stable diffusion and LLM. The team is responsible for research and development of hardware acceleration technologies for AI and cloud computing, via distributed systems, compilers, HPC, and RDMA networking. We are reinventing the ML infra for large scale language models. Successful candidates must be able to commit to an onboarding date by end of year 2026. Please state your availability and graduation date clearly in your resume. Candidates can apply to a maximum of two positions; applications are reviewed on a rolling basis. Responsibilities
Research and develop machine learning systems, including heterogeneous computing architecture, management, and monitoring Deploy machine learning systems, distributed task scheduling, and machine learning training Manage cross-layer optimization of system and AI algorithms and hardware for machine learning (GPU, ASIC) Implement both general purpose training framework features and model-specific optimizations (e.g. LLM, diffusions) Improve efficiency and stability for extremely large scale distributed training jobs Qualifications
Minimum Qualifications
Master distributed, parallel computing principles; knowledge of recent advances in computing, storage, networking, and hardware technologies Familiarity with machine learning algorithms, platforms and frameworks such as PyTorch and Jax Basic understanding of how GPU and/or ASIC works Expert in at least one or two programming languages in Linux environment: C/C++, CUDA, Python Preferred Qualifications
GPU-based high performance computing, RDMA high performance network (MPI, NCCL, ibverbs) Distributed training framework optimizations such as DeepSpeed, FSDP, Megatron, GSPMD AI compiler stacks such as torch.fx, XLA and MLIR Large scale data processing and parallel computing Experience in designing and operating large scale systems in cloud computing or machine learning Experience in in-depth CUDA programming and performance tuning (cutlass, triton) By submitting an application for this role, you accept and agree to our global applicant privacy policy, available here: https://jobs.bytedance.com/en/legal/privacy Job Information
compensation
The base salary range for this position in the selected city is $112725 - $177840 annually. Compensation may vary outside this range based on qualifications, skills, competencies and experience, and location. Base pay is part of the Total Package; the role may be eligible for bonuses, incentives, and restricted stock units. Benefits may vary by country and employment type. Employees have day one access to medical, dental, and vision insurance, a 401(k) with company match, paid parental leave, short-term and long-term disability, life insurance, wellbeing benefits, and more. 10 paid holidays, 10 paid sick days, and 17 days of Paid Personal Time (prorated on hire and increasing with tenure). The Company reserves the right to modify benefits programs at any time, with or without notice. Reasonable Accommodations
ByteDance is committed to providing reasonable accommodations in recruitment processes for candidates with disabilities or other protected reasons. If you need assistance, please reach out to us at https://tinyurl.com/RA-request About ByteDance Doubao (Seed)
Founded in 2023, the ByteDance Doubao (Seed) Team aims to pioneer AI foundation models, leading in cutting-edge research and driving technological and societal advancements. Our research areas span deep learning, reinforcement learning, language, vision, audio, AI Infra, and AI Safety, with labs and positions across China, Singapore, and the US. Why Join ByteDance
ByteDance strives to inspire creativity, connect people, and create value for communities. We foster curiosity, humility, and impact, operating with an "Always Day 1" mindset to achieve meaningful breakthroughs for our Company and users. Diversity & Inclusion
We are committed to an inclusive space where employees are valued for skills, experiences, and perspectives. We celebrate diverse voices and strive to reflect the communities we reach. Seniority level
Internship Employment type
Full-time Job function
Engineering and Information Technology Industries: Software Development Referrals increase your chances of interviewing at ByteDance by 2x
#J-18808-Ljbffr
Join to apply for the
Software Engineer in ML Systems Graduate (AML - Machine Learning Systems) - 2026 Start (BS/MS)
role at
ByteDance Get AI-powered advice on this job and more exclusive features. Overview
The AML Machine Learning Systems team provides end-to-end machine learning experience and resources for the company. The team builds heterogeneous ML training and inference systems based on GPU and AI chips and advances the state-of-the-art of ML systems technology to accelerate models such as stable diffusion and LLM. The team is responsible for research and development of hardware acceleration technologies for AI and cloud computing, via distributed systems, compilers, HPC, and RDMA networking. We are reinventing the ML infra for large scale language models. Successful candidates must be able to commit to an onboarding date by end of year 2026. Please state your availability and graduation date clearly in your resume. Candidates can apply to a maximum of two positions; applications are reviewed on a rolling basis. Responsibilities
Research and develop machine learning systems, including heterogeneous computing architecture, management, and monitoring Deploy machine learning systems, distributed task scheduling, and machine learning training Manage cross-layer optimization of system and AI algorithms and hardware for machine learning (GPU, ASIC) Implement both general purpose training framework features and model-specific optimizations (e.g. LLM, diffusions) Improve efficiency and stability for extremely large scale distributed training jobs Qualifications
Minimum Qualifications
Master distributed, parallel computing principles; knowledge of recent advances in computing, storage, networking, and hardware technologies Familiarity with machine learning algorithms, platforms and frameworks such as PyTorch and Jax Basic understanding of how GPU and/or ASIC works Expert in at least one or two programming languages in Linux environment: C/C++, CUDA, Python Preferred Qualifications
GPU-based high performance computing, RDMA high performance network (MPI, NCCL, ibverbs) Distributed training framework optimizations such as DeepSpeed, FSDP, Megatron, GSPMD AI compiler stacks such as torch.fx, XLA and MLIR Large scale data processing and parallel computing Experience in designing and operating large scale systems in cloud computing or machine learning Experience in in-depth CUDA programming and performance tuning (cutlass, triton) By submitting an application for this role, you accept and agree to our global applicant privacy policy, available here: https://jobs.bytedance.com/en/legal/privacy Job Information
compensation
The base salary range for this position in the selected city is $112725 - $177840 annually. Compensation may vary outside this range based on qualifications, skills, competencies and experience, and location. Base pay is part of the Total Package; the role may be eligible for bonuses, incentives, and restricted stock units. Benefits may vary by country and employment type. Employees have day one access to medical, dental, and vision insurance, a 401(k) with company match, paid parental leave, short-term and long-term disability, life insurance, wellbeing benefits, and more. 10 paid holidays, 10 paid sick days, and 17 days of Paid Personal Time (prorated on hire and increasing with tenure). The Company reserves the right to modify benefits programs at any time, with or without notice. Reasonable Accommodations
ByteDance is committed to providing reasonable accommodations in recruitment processes for candidates with disabilities or other protected reasons. If you need assistance, please reach out to us at https://tinyurl.com/RA-request About ByteDance Doubao (Seed)
Founded in 2023, the ByteDance Doubao (Seed) Team aims to pioneer AI foundation models, leading in cutting-edge research and driving technological and societal advancements. Our research areas span deep learning, reinforcement learning, language, vision, audio, AI Infra, and AI Safety, with labs and positions across China, Singapore, and the US. Why Join ByteDance
ByteDance strives to inspire creativity, connect people, and create value for communities. We foster curiosity, humility, and impact, operating with an "Always Day 1" mindset to achieve meaningful breakthroughs for our Company and users. Diversity & Inclusion
We are committed to an inclusive space where employees are valued for skills, experiences, and perspectives. We celebrate diverse voices and strive to reflect the communities we reach. Seniority level
Internship Employment type
Full-time Job function
Engineering and Information Technology Industries: Software Development Referrals increase your chances of interviewing at ByteDance by 2x
#J-18808-Ljbffr