Logo
ByteDance

Software Engineer in ML Systems Graduate (AML - Machine Learning Systems) - 2026

ByteDance, Seattle, Washington, us, 98127

Save Job

Software Engineer in ML Systems Graduate (AML - Machine Learning Systems) - 2026 Start (BS/MS)

Responsibilities We are looking for talented individuals to join our team in 2026. As a graduate, you will get opportunities to pursue bold ideas, tackle complex challenges, and unlock limitless growth. Launch your career where inspiration is infinite at ByteDance. Successful candidates must be able to commit to an onboarding date by end of year 2026. Please state your availability and graduation date clearly in your resume. The AML Machine Learning Systems team provides E2E machine learning experience and machine learning resources for the company. The team builds heterogeneous ML training and inference systems based on GPU and AI chips and advances the state-of-the-art of ML systems technology to accelerate models such as stable diffusion and LLM. Responsibilities include: Research and develop our machine learning systems, including heterogeneous computing architecture, management, and monitoring Deploy machine learning systems, distributed task scheduling, machine learning training Manage cross-layer optimization of system and AI algorithms and hardware for machine learning (GPU, ASIC) Implement both general purpose training framework features and model specific optimizations (e.g. LLM, diffusions) Improve efficiency and stability for extremely large scale distributed training jobs Qualifications Minimum Qualifications: Master distributed, parallel computing principles; know the recent advances in computing, storage, networking, and hardware technologies; Familiar with machine learning algorithms, platforms and frameworks such as PyTorch and Jax; Have basic understanding of how GPU and/or ASIC works; Expert in at least one or two programming languages in Linux environment: C/C++, CUDA, Python. Preferred Qualifications:

GPU based high performance computing, RDMA high performance network (MPI, NCCL, ibverbs); Distributed training framework optimizations such as DeepSpeed, FSDP, Megatron, GSPMD; AI compiler stacks such as torch.fx, XLA and MLIR; Large scale data processing and parallel computing; Experiences in designing and operating large scale systems in cloud computing or machine learning; Experiences in in-depth CUDA programming and performance tuning (cutlass, triton). Job Information The base salary range for this position in the selected city is $112725 - $177840 annually. Compensation may vary outside of this range depending on a number of factors, including a candidate's qualifications, skills, competencies and experience, and location. Benefits may vary depending on the nature of employment and the country work location. Employees have day one access to medical, dental, and vision insurance, a 401(k) savings plan with company match, paid parental leave, short-term and long-term disability coverage, life insurance, wellbeing benefits, among others. ByteDance is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe and so does our workplace.

#J-18808-Ljbffr