AMD
Senior Manager, Performance AI/ML Network Deployment Engineering
AMD, Santa Clara, California, us, 95053
Senior Manager, Performance AI/ML Network Deployment Engineering
Join to apply for the
Senior Manager, Performance AI/ML Network Deployment Engineering
role at
AMD .
Base pay range $210,400.00/yr – $315,600.00/yr
What you do at AMD At AMD, our mission is to build great products that accelerate next‑generation computing experiences—from AI and data centers to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career.
The role The Senior Manager, DC GPU Advanced Forward Deployment and Systems Engineering is a leadership position designed to optimize the design, roll‑out and post‑rollout management of AI/ML Fabrics. You will be the technical interface between customers and various internal engineering groups, field application engineers, and leveraging extensive experience in large network architecture, storage, AI/ML network deployments and performance tuning. This role requires a disciplined approach to system triage, at‑scale debug and infrastructure optimization to ensure robust performance and efficient transitions from GPU production qualification to at‑scale datacenter deployment.
Key responsibilities
Collaborate with strategic customers on scalable designs involving compute, networking and storage environments, working with industry partners and internal teams to accelerate deployment and adoption of AI/ML models.
Engage in system‑level triage and at‑scale debug of complex issues across hardware, firmware and software, ensuring rapid resolution and system reliability.
Drive the ramp of Instinct‑based large‑scale AI datacenter infrastructure based on NPI base platform hardware with ROCm, scaling up to pod and cluster levels, leveraging best‑in‑class network architecture for AI/ML workloads.
Enhance tools and methodologies for large‑scale deployments to meet customer uptime goals and exceed performance expectations.
Engage with clients to deeply understand their technical needs, ensuring satisfaction with tailored solutions and leveraging past experience in strategic customer engagements.
Provide domain specific knowledge to other groups at AMD, sharing lessons learned to drive continuous improvement.
Engage with AMD product groups to resolve application and customer issues.
Develop and present training materials to internal audiences, at customer venues and industry conferences.
Preferred experience
Expertise in networking and performance optimization for large‑scale AI/ML networks, including network, compute, storage cluster design, modelling, analytics, performance tuning and scalability improvements.
Proven leadership in engaging customers with diverse technical disciplines in proof‑of‑concept, competitive evaluations, and early field trials.
Hands‑on experience with RoCEv2, VXLAN‑EVPN, BGP and lossless fabrics.
Experience working with large customers such as cloud service providers and global enterprise customers.
Strong influence on design and technology roadmaps, leveraging a deep understanding of datacenter products and market trends.
Network deployment expertise and track record of delivering large projects on time (Cisco, Juniper or Arista experience preferred).
Excellent communication skills from engineer to mid‑management to C‑level audiences.
Academic credentials
Bachelor’s or Master’s degree in computer science, engineering or related field.
Senior‑level role; recent college graduates will not be considered.
Ability to work well in a geographically dispersed team.
Certifications in networking, AI/ML or cloud technologies.
Benefits offered are described: AMD benefits at a glance.
AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee‑based recruitment services. AMD and its subsidiaries are equal‑opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third‑party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.
Seniority level Mid‑Senior level
Employment type Full‑time
Job function Semiconductor manufacturing
Location Santa Clara, CA
#J-18808-Ljbffr
Senior Manager, Performance AI/ML Network Deployment Engineering
role at
AMD .
Base pay range $210,400.00/yr – $315,600.00/yr
What you do at AMD At AMD, our mission is to build great products that accelerate next‑generation computing experiences—from AI and data centers to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career.
The role The Senior Manager, DC GPU Advanced Forward Deployment and Systems Engineering is a leadership position designed to optimize the design, roll‑out and post‑rollout management of AI/ML Fabrics. You will be the technical interface between customers and various internal engineering groups, field application engineers, and leveraging extensive experience in large network architecture, storage, AI/ML network deployments and performance tuning. This role requires a disciplined approach to system triage, at‑scale debug and infrastructure optimization to ensure robust performance and efficient transitions from GPU production qualification to at‑scale datacenter deployment.
Key responsibilities
Collaborate with strategic customers on scalable designs involving compute, networking and storage environments, working with industry partners and internal teams to accelerate deployment and adoption of AI/ML models.
Engage in system‑level triage and at‑scale debug of complex issues across hardware, firmware and software, ensuring rapid resolution and system reliability.
Drive the ramp of Instinct‑based large‑scale AI datacenter infrastructure based on NPI base platform hardware with ROCm, scaling up to pod and cluster levels, leveraging best‑in‑class network architecture for AI/ML workloads.
Enhance tools and methodologies for large‑scale deployments to meet customer uptime goals and exceed performance expectations.
Engage with clients to deeply understand their technical needs, ensuring satisfaction with tailored solutions and leveraging past experience in strategic customer engagements.
Provide domain specific knowledge to other groups at AMD, sharing lessons learned to drive continuous improvement.
Engage with AMD product groups to resolve application and customer issues.
Develop and present training materials to internal audiences, at customer venues and industry conferences.
Preferred experience
Expertise in networking and performance optimization for large‑scale AI/ML networks, including network, compute, storage cluster design, modelling, analytics, performance tuning and scalability improvements.
Proven leadership in engaging customers with diverse technical disciplines in proof‑of‑concept, competitive evaluations, and early field trials.
Hands‑on experience with RoCEv2, VXLAN‑EVPN, BGP and lossless fabrics.
Experience working with large customers such as cloud service providers and global enterprise customers.
Strong influence on design and technology roadmaps, leveraging a deep understanding of datacenter products and market trends.
Network deployment expertise and track record of delivering large projects on time (Cisco, Juniper or Arista experience preferred).
Excellent communication skills from engineer to mid‑management to C‑level audiences.
Academic credentials
Bachelor’s or Master’s degree in computer science, engineering or related field.
Senior‑level role; recent college graduates will not be considered.
Ability to work well in a geographically dispersed team.
Certifications in networking, AI/ML or cloud technologies.
Benefits offered are described: AMD benefits at a glance.
AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee‑based recruitment services. AMD and its subsidiaries are equal‑opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third‑party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.
Seniority level Mid‑Senior level
Employment type Full‑time
Job function Semiconductor manufacturing
Location Santa Clara, CA
#J-18808-Ljbffr