NVIDIA
Deep Learning Solutions Architect – Inference Optimization
NVIDIA, Italy, New York, United States
Deep Learning Solutions Architect – Inference Optimization
NVIDIA’s Worldwide Field Operations (WWFO) team is seeking a Solution Architect with a deep understanding of neural network inference. As customers adopt increasingly complex inference pipelines on state‑of‑the‑art infrastructure, there is a growing need for experts who can guide the integration of advanced inference techniques such as speculative decoding, request‑scheduler optimizations or FP4 quantization. The ideal candidate will be proficient using tools such as TRT LLM, vLLM, SGLang or similar, and have strong systems knowledge, enabling customers to fully use the capabilities of the new GB300 NVL72 systems (for example work on efficient KV cache offloading or help with inference of new architectures like hybrid or diffusion models, or architect the pre‑ and post‑processing pipelines).
Solutions Architects work with the most exciting computing hardware and software, driving the latest breakthroughs in artificial intelligence. We need individuals who can enable customer productivity and develop lasting relationships with our technology partners, making NVIDIA an integral part of end‑user solutions. We look for someone who is passionate about artificial intelligence, keeps up with rapid field changes, and can coordinate efforts between marketing, business development and engineering.
What You Will Be Doing
Work directly with key customers to understand their technology and provide the best AI solutions.
Perform in‑depth analysis and optimization to ensure the best performance on GPU architecture systems, in particular Grace/ARM based systems, and support optimization of large‑scale inference pipelines.
Partner with Engineering, Product and Sales teams to develop, plan best suitable solutions for customers and enable development and growth of product features through customer feedback and proof‑of‑concept evaluations.
What We Need To See
Excellent verbal, written communication and technical presentation skills in English.
MS/PhD or equivalent experience in Computer Science, Data Science, Electrical/Computer Engineering, Physics, Mathematics or other Engineering fields.
5+ years of work or research experience with Python, C++ or other software development.
Work experience and knowledge of modern NLP, including good understanding of transformer, state‑space, diffusion, MOE model architectures, and expertise in training or optimization/compression/operation of DNNs.
Understanding of key libraries used for NLP/LLM training (e.g. Megatron‑LM, NeMo, DeepSpeed) and/or deployment (TensorRT‑LLM, vLLM, Triton Inference Server).
Enthusiastic about collaborating with various teams and departments and thrives in dynamic environments and stays focused amid constant change.
Self‑starter with demeanor for growth, passion for continuous learning and sharing findings across the team.
Ways To Stand Out From The Crowd
Demonstrated experience in running and debugging large‑scale distributed deep learning training or inference processes.
Experience working with larger transformer‑based architectures for NLP, CV, ASR or other.
Applied NLP technology in production environments.
Proficient with DevOps tools including Docker, Kubernetes and Singularity.
Understanding of HPC systems: data center design, high‑speed interconnect InfiniBand, cluster storage and scheduling related design and/or management experience.
Widely considered to be one of the technology world’s most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package. As you plan your future, see what we can offer to you and your family
www.nvidiabenefits.com .
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. We do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.
Seniority level Mid-Senior level
Employment type Full‑time
Job function Computer Hardware Manufacturing, Software Development, and Computers and Electronics Manufacturing
#J-18808-Ljbffr
Solutions Architects work with the most exciting computing hardware and software, driving the latest breakthroughs in artificial intelligence. We need individuals who can enable customer productivity and develop lasting relationships with our technology partners, making NVIDIA an integral part of end‑user solutions. We look for someone who is passionate about artificial intelligence, keeps up with rapid field changes, and can coordinate efforts between marketing, business development and engineering.
What You Will Be Doing
Work directly with key customers to understand their technology and provide the best AI solutions.
Perform in‑depth analysis and optimization to ensure the best performance on GPU architecture systems, in particular Grace/ARM based systems, and support optimization of large‑scale inference pipelines.
Partner with Engineering, Product and Sales teams to develop, plan best suitable solutions for customers and enable development and growth of product features through customer feedback and proof‑of‑concept evaluations.
What We Need To See
Excellent verbal, written communication and technical presentation skills in English.
MS/PhD or equivalent experience in Computer Science, Data Science, Electrical/Computer Engineering, Physics, Mathematics or other Engineering fields.
5+ years of work or research experience with Python, C++ or other software development.
Work experience and knowledge of modern NLP, including good understanding of transformer, state‑space, diffusion, MOE model architectures, and expertise in training or optimization/compression/operation of DNNs.
Understanding of key libraries used for NLP/LLM training (e.g. Megatron‑LM, NeMo, DeepSpeed) and/or deployment (TensorRT‑LLM, vLLM, Triton Inference Server).
Enthusiastic about collaborating with various teams and departments and thrives in dynamic environments and stays focused amid constant change.
Self‑starter with demeanor for growth, passion for continuous learning and sharing findings across the team.
Ways To Stand Out From The Crowd
Demonstrated experience in running and debugging large‑scale distributed deep learning training or inference processes.
Experience working with larger transformer‑based architectures for NLP, CV, ASR or other.
Applied NLP technology in production environments.
Proficient with DevOps tools including Docker, Kubernetes and Singularity.
Understanding of HPC systems: data center design, high‑speed interconnect InfiniBand, cluster storage and scheduling related design and/or management experience.
Widely considered to be one of the technology world’s most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package. As you plan your future, see what we can offer to you and your family
www.nvidiabenefits.com .
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. We do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.
Seniority level Mid-Senior level
Employment type Full‑time
Job function Computer Hardware Manufacturing, Software Development, and Computers and Electronics Manufacturing
#J-18808-Ljbffr