Logo
Amazon

Senior Software Development Engineer - Machine Learning Engineer, Frontier AI Ro

Amazon, Seattle, Washington, us, 98127

Save Job

We are looking for a highly skilled

Machine Learning Systems Engineer

to join our Frontier AI Robotics team. This exciting role emphasizes the development and optimization of distributed training infrastructure for large-scale machine learning models, especially in

deep learning

and

transformer-based architectures . You will collaborate with top scientists and engineers to create scalable, high-performance systems that drive cutting-edge AI research and applications. About the Team: At Frontier AI & Robotics, we are not just advancing robotics—we are reimagining it. Our mission is to construct the future of intelligent robotics utilizing frontier foundation models and end-to-end learned systems. We face some of the toughest challenges in AI and robotics, from developing advanced perception systems to crafting adaptive strategies for manipulation in complex real-world scenarios. Our unique blend of ambitious research vision and practical impact makes us stand out. We capitalize on Amazon's substantial computational resources and extensive real-world datasets to train and deploy state-of-the-art foundation models. Our work covers the entire spectrum of robotics intelligence, ranging from multimodal perception using images, videos, and sensor data, to sophisticated manipulation techniques designed to excel in diverse real-world situations. We are committed to building systems that not only operate in a lab setting but are also scalable to meet the needs of Amazon's global operations. Join us

if you are passionate about pushing the boundaries of what is possible in robotics, eager to collaborate with world-class researchers, and excited about witnessing your innovations deployed at an unprecedented scale. Basic Qualifications: 5+ years of non-internship professional software development experience 5+ years of programming experience in at least one software programming language 5+ years leading design or architecture of new and existing systems Experience mentoring, serving as a tech lead, or leading an engineering team Design, build, and optimize machine learning infrastructure for large-scale training and inference Proficient in applying PyTorch, Python, and C++ to engineer modular and scalable ML systems Familiarity with parallelism techniques such as data, tensor, model, and pipeline parallelism Ability to monitor and optimize GPU memory and throughput for efficient training of large models Experience collaborating cross-functionally with research and data infrastructure teams to integrate new models and features Deep understanding of LLM algorithms and deep learning frameworks like PyTorch Strong foundation in mathematics and statistics including linear algebra, calculus, probability, and statistics Preferred Qualifications: 5+ years of experience with the full software development lifecycle, including coding standards, code reviews, source control management, build processes, testing, and operations Bachelor's degree in computer science or equivalent We are an equal opportunity employer and do not discriminate based on protected veteran status, disability, or other legally protected status. Applicants should apply via our internal or external career site. Our compensation reflects the cost of labor across various US geographic markets. The base pay for this position ranges from $151,300/year in our lowest geographic market to $261,500/year in our highest geographic market. Pay varies based on factors such as market location and job-related knowledge, skills, and experience. Additionally, equity, sign-on payments, and other forms of compensation may be included in the total compensation package, along with a full range of medical, financial, and/or other benefits. This position will remain posted until filled.