Amazon Web Services (AWS)

Software engineer -AI/ML, AWS Neuron Inference, AWS Neuron Inference

Amazon Web Services (AWS), Seattle, Washington, us, 98127

Description AWS Neuron is the complete software stack for the AWS Inferentia and Trainium cloud‑scale machine learning accelerators. This role is for a senior software engineer in the Machine Learning Inference Applications team. The role is responsible for development and performance optimization of core building blocks of LLM Inference — Attention, MLP, Quantization, Speculative Decoding, Mixture of Experts, etc. The team works side by side with chip architects, compiler engineers and runtime engineers to deliver performance and accuracy on Neuron devices across a range of models such as Llama 3.3 70B, 3.1 405B, DBRX, Mixtral, etc.

Key Responsibilities Adapt the latest research in LLM optimization to Neuron chips to extract best performance from both open source and internally developed models. Work across teams and organizations. Collaborate with developers and researchers to iterate and test performance improvements.

About the Team Our team supports new members, has a mix of experience levels, celebrates knowledge‑sharing and mentorship. Senior members offer one‑on‑one mentoring and thorough, kind code reviews. We care about career growth and assign projects that help growing engineering expertise so team members feel empowered to take on more complex tasks.

Basic Qualifications

3+ years of non‑internship professional software development experience

2+ years of non‑internship design or architecture (design patterns, reliability and scaling) of new and existing systems

Programming proficiency in Python or C++ (at least one required)

Experience with PyTorch

Working knowledge of Machine Learning and LLM fundamentals including transformer architecture, training/inference lifecycles, and optimization techniques

Strong understanding of system performance, memory management, and parallel computing principles

Preferred Qualifications

Experience with JAX

Experience with debugging, profiling, and implementing software engineering best practices in large‑scale systems

Expertise with PyTorch, JIT compilation, and AOT tracing

Experience with CUDA kernels or equivalent ML/low‑level kernels

Experience with performant kernel development (e.g., CUTLASS, FlashInfer)

Experience with inference serving platforms (vLLM, SGLang, TensorRT) in production environments

Deep understanding of computer architecture, operating systems, and parallel computing

Company Annapurna Labs (U.S.) Inc.

Compensation The base pay for this position ranges from $129,300/year in our lowest geographic market up to $223,600/year in our highest geographic market. Pay is based on market location and may vary depending on job‑related knowledge, skills, and experience. Amazon is a total compensation company. Equity, sign‑on payments, and other forms of compensation may be provided as part of a total compensation package, in addition to a full range of medical, financial, and other benefits.

Equal Opportunity Employment Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status. Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.

#J-18808-Ljbffr