Logo
Amazon

Senior Software Engineer, Frontier AI & Robotics

Amazon, San Francisco, California, United States, 94199

Save Job

Join the forefront of robotics innovation at Amazon's Frontier AI & Robotics team. Collaborate with world-renowned AI experts to transform groundbreaking research into high-performance production systems. As a Senior Machine Learning Engineer within our science team, you will play a vital role in optimizing advanced transformer architectures for robotics applications, utilizing your skills in CUDA and TensorRT to deliver outstanding inference efficiency on a large scale. In this role, you will: Drive optimization strategies for large-scale foundation models using TensorRT, CUDA, and other advanced NVIDIA tools. Collaborate closely with scientists to enhance model architectures for optimal hardware use. Design and implement efficient compilation pipelines for complex transformer architectures. Develop benchmarking frameworks to analyze and optimize model performance. Create reliable monitoring solutions to ensure robust model serving at scale. Explore innovative optimization technologies, including ONNX Runtime and other ML compilers. Uphold high engineering standards through comprehensive testing, documentation, and code reviews. A typical day in this role includes: Optimizing transformer blocks using custom CUDA kernels and TensorRT techniques. Partnering with scientists to evaluate model architectures and suggest improvements. Implementing and benchmarking various large-scale model optimization strategies. Debugging performance bottlenecks using NVIDIA profiling tools. Engaging in technical discussions about new model architectures with the science team. Designing and maintaining performance monitoring systems for production use. Prototyping new acceleration methods utilizing emerging compilation frameworks. At Frontier AI & Robotics, we are not just advancing robotics—we are redefining it. Led by pioneering AI researchers, we address some of the toughest challenges in AI and robotics, developing systems that seamlessly integrate advanced perception and manipulation strategies in complex real-world scenarios. Be part of a team that leverages Amazon's vast computational resources and rich datasets to train and deploy cutting-edge foundation models. Basic qualifications include: Bachelor's degree in computer science or equivalent. 5+ years of professional software development experience. 5+ years of programming experience in at least one software language. 5+ years of experience designing or architecting systems for reliability and scaling. Experience as a mentor or tech lead. Strong expertise in Python, C++, and CUDA programming. Experience with TensorRT or similar ML optimization frameworks. Preferred qualifications include: Expertise in NVIDIA's ML stack (cuDNN, CUDA Graph, etc.). Experience with ML compilers (ONNX Runtime, TVM, etc.). Background in performance profiling and optimization. Direct experience working with research teams. Track record of building robust monitoring systems. Familiarity with large-scale ML serving systems. If you're passionate about shaping the future of robotics and making a tangible impact through your work, we encourage you to apply! This position may require you to work safely and cooperatively with other employees, adhere to high standards, communicate effectively, and comply with all laws and company policies. Consideration will be given to qualified applicants with arrest and conviction records as per local laws.