Logo
Amazon

Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training - Performance Opti

Amazon, San Luis Obispo, California, us, 93403

Save Job

Overview

Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training - Performance Optimization AWS Utility Computing (UC) provides product innovations that set AWS’s services apart. This role supports the development and management of Compute, Database, Storage, Platform, and Productivity Apps services in AWS, with exposure to generative AI services and other cloud offerings across the AWS portfolio. Annapurna Labs designs silicon and software that accelerates innovation, with custom chips, accelerators, and software stacks for cloud solutions. AWS Neuron is the complete software stack for the AWS Inferentia and Trainium cloud-scale ML accelerators and the Trn1 and Inf1 servers. This role is for a senior software engineer in the Machine Learning Applications (ML Apps) team for AWS Neuron. It is responsible for development, enablement and performance tuning of ML model families, including large language models and other architectures, on Neuron hardware. Responsibilities

Lead efforts to build distributed training and inference support into PyTorch, TensorFlow, and JAX using XLA and the Neuron compiler and runtime stacks. Tune models to achieve the highest performance and efficiency on AWS Trainium and Inferentia silicon, including Trn1 and Inf1 servers. Collaborate with chip architects, compiler engineers, and runtime engineers to create, build, and tune distributed training solutions for Neuron-based systems. Work with distributed training libraries such as FairScale (FSDP), DeepSpeed, and related tooling to extend support for Neuron-based systems. Experience training large models using Python and applying distributed training techniques to scale model training. Qualifications

Basic Qualifications

5+ years of non-internship professional software development experience 5+ years of programming experience in at least one programming language 5+ years of experience in design or architecture of systems (design patterns, reliability, scaling) 5+ years of full software development lifecycle experience (coding standards, code reviews, source control, build, test, operate) Experience as a mentor, tech lead, or leading an engineering team Preferred Qualifications

Bachelor’s degree in computer science or equivalent Machine Learning knowledge in frameworks and end-to-end model training Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status. If you require a workplace accommodation or adjustment during the application and hiring process, including interview or onboarding support, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner. Our compensation reflects the cost of labor across US geographic markets. The base pay for this position ranges from $151,300/year to $261,500/year, depending on location and experience. Amazon is a total compensation company; equity, sign-on, and other benefits may be provided as part of the package. For more information, visit https://www.aboutamazon.com/workplace/employee-benefits. This position will remain posted until filled. Applicants should apply via our internal or external career site. About the Team

Our team is dedicated to supporting new members with a mentoring-focused culture, thorough code reviews, and career growth opportunities. We value knowledge-sharing and strive to assign projects that help you grow as an engineer. We encourage candidates to apply even if they do not meet all listed qualifications or follow a traditional path, as AWS values diverse experiences. Location

Location: ES, Community of Madrid, Madrid Share this job

For current government employees: Before proceeding, review the FAQs at https://www.amazon.jobs/en/faqs#faqs-for-us-government-employees Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.

#J-18808-Ljbffr