Amazon Jobs

Senior Software Development Engineer, AI/ML, AWS Neuron, Model Inference

Amazon Jobs, Seattle, Washington, us, 98127

Overview

The Annapurna Labs team at Amazon Web Services (AWS) builds AWS Neuron, the software development kit used to accelerate deep learning and GenAI workloads on Amazon’s custom ML accelerators, Inferentia and Trainium. The AWS Neuron SDK is the backbone for accelerating deep learning and GenAI workloads on Inferentia and Trainium, including an ML compiler, runtime, and application framework that integrates with popular ML frameworks like PyTorch and JAX to enable high-performance inference and training. The Inference Enablement and Acceleration team operates across the stack—from PyTorch to the hardware-software boundary—building infrastructure, innovating methods, and creating high-performance ML kernels to fine-tune compute units for customer workloads. The team combines hardware knowledge with ML expertise to advance AI acceleration and collaborates with customers to enable model performance across Open Source ecosystems for seamless integration at scale. As part of the broader Neuron organization, this team collaborates across multiple technology layers—frameworks, kernels, compiler, runtime, and collectives—to optimize current performance and contribute to future architecture designs. This role offers the opportunity to work at the intersection of machine learning, high-performance computing, and distributed architectures to shape the future of AI acceleration technology. Responsibilities

You will architect and implement business-critical features and mentor a team of experienced engineers in a fast-moving environment. The team works small and agile, inventing and experimenting without a blueprint, while closely collaborating with customers to enable model enablement and optimize ML workloads on AWS accelerators. You will work with customers and open source ecosystems to deliver peak performance at scale. Key responsibilities include: Lead efforts in building distributed inference support for PyTorch in the Neuron SDK, tuning models for highest performance on AWS Trainium and Inferentia. Develop and performance-tune a wide variety of LLM model families, including 500B+ models like Llama and DeepSeek. Collaborate with performance, compiler, and runtime engineers to create, build, and tune distributed inference solutions with Trainium and Inferentia. Build infrastructure to systematically analyze and onboard multiple models with diverse architectures. Collaborate with performance teams to enable and evaluate optimizations such as fusion, sharding, tiling, and scheduling. Conduct comprehensive testing, including unit and end-to-end model testing with CI/CD pipelines. Work directly with customers to enable and optimize their ML models on AWS accelerators. Collaborate across teams to develop innovative optimization techniques. Build online/offline inference serving with vLLM, SGLang, TensorRT or similar platforms in production environments. Qualifications

• 5+ years of non-internship professional software development experience • 5+ years of non-internship design or architecture experience (design patterns, reliability and scaling) • Fundamentals of machine learning and LLMs, their architecture, training and inference lifecycles, with experience optimizing model execution • Experience programming with at least one software language • 5+ years of full software development life cycle experience (coding standards, code reviews, source control, build, testing, operations) • Masters degree in computer science or equivalent EEO and Benefits

Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status. The compensation details reflect the cost of labor across several US geographic markets and may include base pay, equity, sign-on payments, and other benefits. For more information, please visit the following: https://awsdocs-neuron.readthedocs-hosted.com https://aws.amazon.com/machine-learning/neuron/ https://github.com/aws/aws-neuron-sdk https://www.amazon.science/how-silicon-innovation-became-the-secret-sauce-behind-awss-success https://www.aboutamazon.com/workplace/employee-benefits Los Angeles County applicants: duties include working safely, adhering to standards, effective communication, and compliance with laws and policies. Criminal history may affect eligibility. We will consider qualified applicants with arrest and conviction records per the Los Angeles County Fair Chance Ordinance. Our inclusive culture empowers Amazonians to deliver the best results. If you need a workplace accommodation during the application or hiring process, please visit the accommodations page. If the country/region you’re applying in isn’t listed, contact your Recruiting Partner. Our compensation reflects the cost of labor across US markets. The base pay for this position ranges from $151,300/year to $261,500/year, with variations based on market, knowledge, skills, and experience. This position may include equity, sign-on, and other compensation as part of a total package. For more information, please visit https://www.aboutamazon.com/workplace/employee-benefits. This position will remain posted until filled. Applicants should apply via our internal or external career site.

#J-18808-Ljbffr