Annapurna Labs Inc.
Lead Software Engineer, AI/ML, AWS Neuron, Model Optimization
Annapurna Labs Inc., Cupertino, California, United States, 95014
Requirements
Must have
Bachelor’s degree in computer science or a related field, or equivalent experience.
A minimum of 5 years of professional software development experience.
At least 5 years of experience in the design or architecture of new and existing systems, focusing on design patterns, reliability, and scalability.
Fundamental knowledge of machine learning and large language models (LLMs), including their architecture, training, and inference lifecycle, alongside experience with optimizations for enhancing model execution.
Proficient in software development using C++ and Python (experience in at least one is mandatory).
Strong understanding of system performance, memory management, and parallel computing principles.
Skilled in debugging, profiling, and applying best software engineering practices in large-scale systems.
Responsibilities
I will lead efforts in developing distributed inference support for PyTorch within the Neuron SDK.
I am responsible for tuning models to achieve the highest performance and efficiency on customer AWS Trainium and Inferentia silicon and servers.
My role includes designing, developing, and optimizing machine learning models and frameworks for deployment on specialized ML hardware accelerators.
I will participate in all phases of the ML system development lifecycle, encompassing architecture design, implementation, performance profiling, optimizations, testing, and production deployment.
I will build infrastructure to systematically analyze and onboard a variety of models with diverse architecture.
I am tasked with designing and implementing high-performance kernels and features for ML operations, optimizing system-level performance across various generations of Neuron hardware.
I will conduct detailed performance analysis using profiling tools to identify and resolve bottlenecks and implement necessary optimizations including fusion, sharding, and scheduling.
My responsibilities also include rigorous testing, both unit and end-to-end model testing, alongside ensuring continuous deployment and releases through pipelines.
I will directly collaborate with customers to enable and optimize their ML models on AWS accelerators and work with cross-functional teams to develop innovative optimization techniques.
Company Our team at Annapurna Labs within Amazon Web Services (AWS) focuses on creating AWS Neuron, a software development kit designed to accelerate deep learning and Generative AI workloads. The Neuron SDK is essential for enhancing ML performance on Amazon’s custom machine learning accelerators: Inferentia and Trainium. We pride ourselves on working across all technology layers, from frameworks to hardware, and actively engage with customers to ensure that their workloads run efficiently. In our unique and experimental work culture, we foster collaboration, technical ownership, and continuous learning, and prioritize mentorship for our newer team members. I am excited to offer a pioneering role at the intersection of machine learning, high-performance computing, and distributed architectures, where I will drive architectural innovations that shape the future of AI acceleration technology. Join us at the forefront of AI/ML infrastructure challenges and contribute to building impactful solutions for our global customer base!
#J-18808-Ljbffr
Bachelor’s degree in computer science or a related field, or equivalent experience.
A minimum of 5 years of professional software development experience.
At least 5 years of experience in the design or architecture of new and existing systems, focusing on design patterns, reliability, and scalability.
Fundamental knowledge of machine learning and large language models (LLMs), including their architecture, training, and inference lifecycle, alongside experience with optimizations for enhancing model execution.
Proficient in software development using C++ and Python (experience in at least one is mandatory).
Strong understanding of system performance, memory management, and parallel computing principles.
Skilled in debugging, profiling, and applying best software engineering practices in large-scale systems.
Responsibilities
I will lead efforts in developing distributed inference support for PyTorch within the Neuron SDK.
I am responsible for tuning models to achieve the highest performance and efficiency on customer AWS Trainium and Inferentia silicon and servers.
My role includes designing, developing, and optimizing machine learning models and frameworks for deployment on specialized ML hardware accelerators.
I will participate in all phases of the ML system development lifecycle, encompassing architecture design, implementation, performance profiling, optimizations, testing, and production deployment.
I will build infrastructure to systematically analyze and onboard a variety of models with diverse architecture.
I am tasked with designing and implementing high-performance kernels and features for ML operations, optimizing system-level performance across various generations of Neuron hardware.
I will conduct detailed performance analysis using profiling tools to identify and resolve bottlenecks and implement necessary optimizations including fusion, sharding, and scheduling.
My responsibilities also include rigorous testing, both unit and end-to-end model testing, alongside ensuring continuous deployment and releases through pipelines.
I will directly collaborate with customers to enable and optimize their ML models on AWS accelerators and work with cross-functional teams to develop innovative optimization techniques.
Company Our team at Annapurna Labs within Amazon Web Services (AWS) focuses on creating AWS Neuron, a software development kit designed to accelerate deep learning and Generative AI workloads. The Neuron SDK is essential for enhancing ML performance on Amazon’s custom machine learning accelerators: Inferentia and Trainium. We pride ourselves on working across all technology layers, from frameworks to hardware, and actively engage with customers to ensure that their workloads run efficiently. In our unique and experimental work culture, we foster collaboration, technical ownership, and continuous learning, and prioritize mentorship for our newer team members. I am excited to offer a pioneering role at the intersection of machine learning, high-performance computing, and distributed architectures, where I will drive architectural innovations that shape the future of AI acceleration technology. Join us at the forefront of AI/ML infrastructure challenges and contribute to building impactful solutions for our global customer base!
#J-18808-Ljbffr