NVIDIA
NVIDIA’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern deep learning - the next era of computing - with the GPU acting as the brain of computers, robots, and self-driving cars that can perceive and understand the world. Today, we are increasingly known as “The AI Computing Company.” We're looking to grow our company and establish teams with the most thoughtful people in the world.
NVIDIA HGX, MGX and DGX systems deliver the world's leading solutions for enterprise AI infrastructure at scale. With their end-to-end performance and flexibility, these systems enable researchers and scientists to combine simulation, data analytics, and AI to drive scientific progress on the most powerful end-to-end AI supercomputing platforms. Are you ready to change the next generation of computing? Join us at the forefront of technological advancement.
What you’ll be doing:
Lead and drive system bringup for GPU-centric server platforms in factory and data center environments.
Design and implement end-to-end factory workflows, including firmware flashing sequences, security provisioning, and deployment of software mitigations.
Collaborate cross-functionally with data center architects, ODMs, and OEMs to define factory and data center requirements that ensure efficient and reliable production ramp.
Champion reliability, debuggability and optimization in firmware, diagnostic and deployment tool design.
Use AI tools to automate functionality and improve automation.
Troubleshoot at speed of light, working closely with system bring-up teams on next generation AI systems to debug and resolve issues during bringup and deployment.
What we need to see:
5+ years of experience in data center firmware/platform software development.
BS, MS, or PhD in EE, CS, or related technical field (or equivalent experience).
Deep, hands-on expertise of working with ODMs/CSPs, firmware update design and out-of-band management.
Proven track record of architecting and developing server firmware and diagnostic solutions for large-scale data center deployments.
Solid knowledge of hardware interfaces (USB, SMBus/I2C, PCIe) and protocols such as Redfish, MCTP, and PLDM.
Solid knowledge of debugging servers for early bring up.
Advanced skills in C/C++ and Python, with a hands-on approach to coding and debugging during hardware bring-up.
Strong communicator, excellent collaborator, and committed team player.
Self-starter with a problem-solving mindset who thrives in a fast-paced, complex technical environment.
Ways to stand out from the crowd:
Hands-on experience with ODMs/CSPs during system bring-up and volume deployment.
Deep familiarity with x86 or ARM system architecture.
Strong networking expertise with high-speed NICs, including bring-up and configuration in factory environment.
NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people on the planet working for us. If you're creative and autonomous, we want to hear from you!
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 148,000 USD - 235,750 USD for Level 3, and 184,000 USD - 287,500 USD for Level 4.
You will also be eligible for equity and benefits (https://www.nvidia.com/en-us/benefits/) .
Applications for this job will be accepted at least until October 25, 2025.
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.
NVIDIA HGX, MGX and DGX systems deliver the world's leading solutions for enterprise AI infrastructure at scale. With their end-to-end performance and flexibility, these systems enable researchers and scientists to combine simulation, data analytics, and AI to drive scientific progress on the most powerful end-to-end AI supercomputing platforms. Are you ready to change the next generation of computing? Join us at the forefront of technological advancement.
What you’ll be doing:
Lead and drive system bringup for GPU-centric server platforms in factory and data center environments.
Design and implement end-to-end factory workflows, including firmware flashing sequences, security provisioning, and deployment of software mitigations.
Collaborate cross-functionally with data center architects, ODMs, and OEMs to define factory and data center requirements that ensure efficient and reliable production ramp.
Champion reliability, debuggability and optimization in firmware, diagnostic and deployment tool design.
Use AI tools to automate functionality and improve automation.
Troubleshoot at speed of light, working closely with system bring-up teams on next generation AI systems to debug and resolve issues during bringup and deployment.
What we need to see:
5+ years of experience in data center firmware/platform software development.
BS, MS, or PhD in EE, CS, or related technical field (or equivalent experience).
Deep, hands-on expertise of working with ODMs/CSPs, firmware update design and out-of-band management.
Proven track record of architecting and developing server firmware and diagnostic solutions for large-scale data center deployments.
Solid knowledge of hardware interfaces (USB, SMBus/I2C, PCIe) and protocols such as Redfish, MCTP, and PLDM.
Solid knowledge of debugging servers for early bring up.
Advanced skills in C/C++ and Python, with a hands-on approach to coding and debugging during hardware bring-up.
Strong communicator, excellent collaborator, and committed team player.
Self-starter with a problem-solving mindset who thrives in a fast-paced, complex technical environment.
Ways to stand out from the crowd:
Hands-on experience with ODMs/CSPs during system bring-up and volume deployment.
Deep familiarity with x86 or ARM system architecture.
Strong networking expertise with high-speed NICs, including bring-up and configuration in factory environment.
NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people on the planet working for us. If you're creative and autonomous, we want to hear from you!
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 148,000 USD - 235,750 USD for Level 3, and 184,000 USD - 287,500 USD for Level 4.
You will also be eligible for equity and benefits (https://www.nvidia.com/en-us/benefits/) .
Applications for this job will be accepted at least until October 25, 2025.
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.