Vantage Point Consulting Inc.
JOB DESCRIPTION
Required Skills:
Required Skills:
- Deep understanding of hardware designs and subsystems (BMC, PCIe, CPU, GPU , etc.)
- Proven experience with qualification of Hardware Designs for production release (SKU Qual )
- Experience with testing component subsystems for use in existing SKUs (Component Qual )
- Deep Linux systems experience including troubleshooting network interfaces
- Developing and applying configuration management, security best practices, and monitoring and alerting
- Experience with firmware testing and deployment (Firmware Qual )
- Strong automation mindset
- Expert knowledge in 1 or more orchestration tools such as
Salt, Chef, Ansible, or Puppet , and strong Python skills - Strong communication skills - your job will involve writing detailed documentation for others to pick up or leading knowledge-sharing sessions with operations teams
- Hands-on experience in High Performance Computing (HPC) clustered environments from Nvidia or AMD
- Experience in performing automated wide-scale testing on NCCL or other frameworks
- Hands-on experience in qualification automation with specific focus on developing testing within an automation framework for hands-free qualification
- Onsite support of our hardware qualification efforts in NYC3 and SFO2
- Hardware qualification of new server SKUs for Compute and GPU Hypervisor , Storage , and Infrastructure server hardware
- Hardware validation against design targets (functional and performance related)
- Hardware reconfiguration to support different testing efforts (changes to server components)
- Troubleshooting hardware integration with the platform operational tooling (onboarding)
- Firmware validation and qualification
- Performance testing, analysis, and monitoring
- Firmware, BIOS, Kernel upgrades and testing