Amazon
Machine Learning Engineer for AWS Neuron Applications
Amazon, Arlington, Virginia, United States, 22201
AWS Neuron is a complete software stack for the AWS Inferentia and Trainium cloud-scale machine learning accelerators. We are seeking a skilled Machine Learning Engineer to join our Machine Learning Applications (ML Apps) team, focusing on the development and performance tuning of a diverse range of ML models. This includes powerful large language models such as Llama2, GPT2, and GPT3, along with stable diffusion models, Vision Transformers, and more.
In this role, you will work collaboratively with compiler and runtime engineers to build and optimize distributed inference solutions with Trn1. A strong background in enhancing inference performance for latency and throughput using languages and tools such as Python, Pytorch, and JAX is essential. Familiarity with distributed inference libraries like DeepSpeed is highly valuable.
Key Responsibilities:
Lead the integration of distributed inference support into Pytorch and Tensorflow using XLA and the Neuron compiler and runtime systems.
Tune ML models for optimal performance on AWS Trainium and Inferentia silicon and TRn1 and Trn2 servers.
Engage in designing and coding solutions that improve software architecture and drive efficiencies.
Create metrics and implement automation to enhance workflows.
Collaborate with stakeholders and participate in code reviews.
Contribute to discussions that influence business decisions using your technical expertise.
As part of our ML Apps team, you will be embraced by a culture of support, knowledge-sharing, and mentorship. We value the growth of our team members, offering opportunities to take on increasingly complex tasks in a nurturing environment.
Basic Qualifications:
3+ years of professional software development experience.
2+ years in designing or architecting systems for reliability and scaling.
Proficiency in at least one programming language.
Preferred Qualifications:
3+ years of experience through the full software development lifecycle.
Bachelor's degree in Computer Science or equivalent.
This position will remain posted until filled. Applicants are encouraged to apply through our career site. Our compensation reflects various geographic markets and is based on job-related knowledge, skills, and experience, ranging from $129,300 to $223,600 annually. In addition to a competitive salary, Amazon offers a comprehensive benefits package.
We are an equal opportunity employer and do not discriminate based on veteran status, disability, or any legally protected status.