Logo
Amazon

Senior Software Engineer - AI/ML, Distributed Training for AWS Neuron

Amazon, Austin, Texas, us, 78716

Save Job

Join our team as a Senior Software Engineer in the AI/ML space, focusing on Distributed Training for AWS Neuron. At Annapurna Labs, we are at the forefront of innovation, designing advanced silicon and software solutions that empower customers to tackle unprecedented challenges. Our custom chips, accelerators, and robust software stacks are revolutionizing cloud solutions. In this role, you will take charge of developing, enabling, and fine-tuning performance for a diverse range of machine learning models, including large-scale models like GPT, Llama, and Stable Diffusion. You will collaborate closely with chip architects, compiler engineers, and runtime engineers to create and optimize distributed training solutions utilizing Trainium instances. Key Responsibilities: Lead the integration of distributed training support in PyTorch and JAX using XLA, along with the Neuron compiler and runtime stacks. Optimize models for peak performance on AWS custom silicon, including Trainium and Inferentia across various server types. Utilize strong software development skills while collaborating across teams to drive results. Champion best practices in machine learning while mentoring and guiding junior engineers. Basic Qualifications: Bachelor's degree in Computer Science or equivalent. 5+ years of professional software development experience. 5+ years of programming experience in at least one major programming language. 5+ years leading design or architecture of new and existing systems. 5+ years managing the full software development life cycle. Proficiency in machine learning, data mining, information retrieval, statistics, or natural language processing. Preferred Qualifications: Master's degree in Computer Science or equivalent. Experience in computer architecture. Previous expertise in using frameworks like PyTorch, JAX, and TensorFlow. Familiarity with distributed training libraries and end-to-end model training. At Amazon, we are committed to creating a diverse and inclusive workplace. We value all qualified applicants and do not discriminate based on race, national origin, gender, gender identity, sexual orientation, protected veteran status, disability, age, or other legally protected status. We strive to empower our team members to deliver exceptional results for our customers. The base pay for this position ranges from $151,300 per year in our lowest geographic market to $261,500 per year in our highest geographic market, depending on various factors, including job-related knowledge, skills, and experience. In addition to base pay, we offer a comprehensive compensation package, including equity, sign-on payments, and various benefits. For more details, please inquire during the application process.