Amazon Jobs
Key job responsibilities
We're looking for an experienced Software Development Engineer with expertise in GPU/customized chip kernel optimization and ML inference acceleration to architect, design, develop, and optimize high-performance kernel implementations for large language model. You'll contribute to creating and optimizing innovative kernels, custom operators, and low-level optimizations that maximize hardware utilization and minimize computational overhead.
In this role, you will build expertise in kernel development, memory management, and parallel computing that dramatically reduce inference latency and boost throughput for transformer-based models. You'll get opportunities to develop kernel fusion techniques, attention mechanism optimizations, and matrix multiplication accelerations at scale, partnering with engineers and scientists in a fast-paced environment to deliver measurable performance gains. You'll also contribute to our technical roadmap, performance benchmarking, and optimizations focused on kernel-level improvements.
3+ years of non-internship professional software development experience
2+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience
Experience programming with at least one software programming language
Experience with Machine and Deep Learning toolkits such as MXNet, TensorFlow, Caffe and PyTorch
3+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
Bachelor's degree in computer science or equivalent
Experience in Neuron hardware (Inferentia and Trainium chips) and NKI kernel optimization
Experience with CUDA, cuDNN, cuBLAS and other GPU kernel-level optimization techniques
Experience in CUDA programming and GPU kernel development
Qualifications Bachelor's degree in computer science or equivalent. Experience in Neuron hardware (Inferentia and Trainium chips) and NKI kernel optimization. Experience with CUDA, cuDNN, cuBLAS and other GPU kernel-level optimization techniques. Experience in CUDA programming and GPU kernel development.
Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.
Our compensation reflects the cost of labor across several US geographic markets. The base pay for this position ranges from $129,300/year in our lowest geographic market up to $223,600/year in our highest geographic market. Pay is based on a number of factors including market location and may vary depending on job-related knowledge, skills, and experience. Amazon is a total compensation company. Dependent on the position offered, equity, sign-on payments, and other forms of compensation may be provided as part of a total compensation package, in addition to a full range of medical, financial, and/or other benefits. For more information, please visit https://www.aboutamazon.com/workplace/employee-benefits. This position will remain posted until filled. Applicants should apply via our internal or external career site.
#J-18808-Ljbffr
In this role, you will build expertise in kernel development, memory management, and parallel computing that dramatically reduce inference latency and boost throughput for transformer-based models. You'll get opportunities to develop kernel fusion techniques, attention mechanism optimizations, and matrix multiplication accelerations at scale, partnering with engineers and scientists in a fast-paced environment to deliver measurable performance gains. You'll also contribute to our technical roadmap, performance benchmarking, and optimizations focused on kernel-level improvements.
3+ years of non-internship professional software development experience
2+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience
Experience programming with at least one software programming language
Experience with Machine and Deep Learning toolkits such as MXNet, TensorFlow, Caffe and PyTorch
3+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
Bachelor's degree in computer science or equivalent
Experience in Neuron hardware (Inferentia and Trainium chips) and NKI kernel optimization
Experience with CUDA, cuDNN, cuBLAS and other GPU kernel-level optimization techniques
Experience in CUDA programming and GPU kernel development
Qualifications Bachelor's degree in computer science or equivalent. Experience in Neuron hardware (Inferentia and Trainium chips) and NKI kernel optimization. Experience with CUDA, cuDNN, cuBLAS and other GPU kernel-level optimization techniques. Experience in CUDA programming and GPU kernel development.
Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.
Our compensation reflects the cost of labor across several US geographic markets. The base pay for this position ranges from $129,300/year in our lowest geographic market up to $223,600/year in our highest geographic market. Pay is based on a number of factors including market location and may vary depending on job-related knowledge, skills, and experience. Amazon is a total compensation company. Dependent on the position offered, equity, sign-on payments, and other forms of compensation may be provided as part of a total compensation package, in addition to a full range of medical, financial, and/or other benefits. For more information, please visit https://www.aboutamazon.com/workplace/employee-benefits. This position will remain posted until filled. Applicants should apply via our internal or external career site.
#J-18808-Ljbffr