ByteDance
Research Scientist Graduate (High-Performance Computing (Inference Optimization)
ByteDance, Seattle, Washington, us, 98127
Research Scientist Graduate (High-Performance Computing (Inference Optimization) - Vision AI Platform-Seattle) - 2025 Start (PhD)
Join to apply for the Research Scientist Graduate (High-Performance Computing (Inference Optimization) - Vision AI Platform-Seattle) - 2025 Start (PhD) role at ByteDance. Responsibilities: Design and develop next-generation large model inference engines, optimizing GPU cluster performance for image/video generation and multimodal models to achieve industrial-grade low-latency and high-throughput deployment. Lead inference optimization including CUDA/Triton kernel development, TensorRT/TRT-LLM graph optimization, distributed inference strategies, quantization techniques, and PyTorch-based compilation (torch.compile). Build GPU inference acceleration stack with multi-GPU collaboration, PCIe optimization, and high-concurrency service architecture design. Collaborate with algorithm teams on performance bottleneck analysis, software-hardware co-design for vision model deployment, and AI infrastructure ecosystem development. Qualifications: Bachelor’s/Master’s or above in Computer Science/EE/related fields. Proficient in C++/Python and high-performance coding. Expertise in at least one of: GPU programming (CUDA/Triton/TensorRT), model quantization (PTQ/QAT), parallel computing (multi-GPU/multi-node inference), or compiler optimization (TVM/MLIR/XLA/torch.compile). Deep understanding of Transformer architectures and LLM/VLM/Diffusion model optimization. Preferred Qualifications: Experience in large-scale inference systems, vLLM/TGI customization, advanced quantization/sparsity. About Doubao (Seed): Founded in 2023, the ByteDance Doubao (Seed) Team focuses on pioneering advanced AI foundation models with labs and research positions across China, Singapore, and the US. Compensation and Benefits: The base salary range for this position in the selected city is $177,688 - $341,734 annually. Compensation may vary based on qualifications, skills, competencies, experience, and location. Base pay is part of the Total Package and may include discretionary bonuses/incentives and RSUs. Benefits vary by location. Day-one medical, dental, and vision insurance; 401(k) with company match; paid parental leave; short- and long-term disability; life insurance; wellbeing benefits; 10 paid holidays, 10 paid sick days, and 17 days of Paid Personal Time (prorated). Legal notices and accommodations: ByteDance is committed to equal opportunity and inclusion. Reasonable accommodations in recruitment processes are available for candidates with disabilities or other protected reasons. If you need assistance, contact us at https://tinyurl.com/RA-request. Why Join ByteDance: We value creativity, curiosity, and collaboration. Our diverse teams aim to make meaningful breakthroughs and grow together in a fast-paced tech environment. Application and start dates: Successful candidates must be able to commit to onboarding by the end of 2025 and start dates will be prioritized accordingly. Applications are reviewed on a rolling basis. You may apply for up to two positions; we review in the order of application. Location notes: Seattle, WA
#J-18808-Ljbffr
Join to apply for the Research Scientist Graduate (High-Performance Computing (Inference Optimization) - Vision AI Platform-Seattle) - 2025 Start (PhD) role at ByteDance. Responsibilities: Design and develop next-generation large model inference engines, optimizing GPU cluster performance for image/video generation and multimodal models to achieve industrial-grade low-latency and high-throughput deployment. Lead inference optimization including CUDA/Triton kernel development, TensorRT/TRT-LLM graph optimization, distributed inference strategies, quantization techniques, and PyTorch-based compilation (torch.compile). Build GPU inference acceleration stack with multi-GPU collaboration, PCIe optimization, and high-concurrency service architecture design. Collaborate with algorithm teams on performance bottleneck analysis, software-hardware co-design for vision model deployment, and AI infrastructure ecosystem development. Qualifications: Bachelor’s/Master’s or above in Computer Science/EE/related fields. Proficient in C++/Python and high-performance coding. Expertise in at least one of: GPU programming (CUDA/Triton/TensorRT), model quantization (PTQ/QAT), parallel computing (multi-GPU/multi-node inference), or compiler optimization (TVM/MLIR/XLA/torch.compile). Deep understanding of Transformer architectures and LLM/VLM/Diffusion model optimization. Preferred Qualifications: Experience in large-scale inference systems, vLLM/TGI customization, advanced quantization/sparsity. About Doubao (Seed): Founded in 2023, the ByteDance Doubao (Seed) Team focuses on pioneering advanced AI foundation models with labs and research positions across China, Singapore, and the US. Compensation and Benefits: The base salary range for this position in the selected city is $177,688 - $341,734 annually. Compensation may vary based on qualifications, skills, competencies, experience, and location. Base pay is part of the Total Package and may include discretionary bonuses/incentives and RSUs. Benefits vary by location. Day-one medical, dental, and vision insurance; 401(k) with company match; paid parental leave; short- and long-term disability; life insurance; wellbeing benefits; 10 paid holidays, 10 paid sick days, and 17 days of Paid Personal Time (prorated). Legal notices and accommodations: ByteDance is committed to equal opportunity and inclusion. Reasonable accommodations in recruitment processes are available for candidates with disabilities or other protected reasons. If you need assistance, contact us at https://tinyurl.com/RA-request. Why Join ByteDance: We value creativity, curiosity, and collaboration. Our diverse teams aim to make meaningful breakthroughs and grow together in a fast-paced tech environment. Application and start dates: Successful candidates must be able to commit to onboarding by the end of 2025 and start dates will be prioritized accordingly. Applications are reviewed on a rolling basis. You may apply for up to two positions; we review in the order of application. Location notes: Seattle, WA
#J-18808-Ljbffr