Logo
Sustainable Talent

Senior Lab and Infrastructure Support Engineer

Sustainable Talent, Santa Clara, California, us, 95053

Save Job

Senior Lab and Infrastructure Support Engineer

Sustainable Talent is partnering with Nvidia, a global leader in transforming computer graphics, PC gaming, and accelerated computing for over 25 years. We are looking for a Senior Lab and Infrastructure Support Engineer to support our client's dynamic team responsible for maintaining and optimizing Colossus cloud infrastructure, including data centers and labs. This is a W-2 full-time contract based in Santa Clara, CA. We offer competitive pay $70/hr - $80/hr based on factors like experience, education, location, etc., and provide full benefits, PTO, and a strong company culture. The ideal candidate will have a strong technical background, excellent problem-solving skills, and a passion for ensuring the reliability and efficiency of our infrastructure. You will provide a test-bed for developers to test software on various NVIDIA hardware before release. You will also collaborate with Infrastructure Engineers, installing and maintaining Windows/Linux platforms and finding solutions creatively. Our labs run more than 100,000 tests per day and are part of a DevOps pipeline that requires constant supervision, tracking, monitoring, and break-fix. What you'll be doing

Assist in the installation, configuration, and deployment of new hardware and software components. Conduct regular inspections and audits of infrastructure components to identify and address potential issues proactively. Collaborate with cross-functional teams to implement and test new technologies and solutions. Document maintenance activities, troubleshooting procedures, and system configurations. Participate in on-call rotation and respond to emergency situations as needed. Manage labs and datacenters using DCIM tools, spreadsheets, and task tracking tools. Define standards in labs to keep them safe, clean, and organized. Perform routine maintenance tasks on servers, networking equipment, and other infrastructure components in data centers and labs. Troubleshoot hardware and software issues to ensure uninterrupted operation of critical systems. Deploy test boards that run automated tests from software developers and triage/root-cause issues that may be due to test setup rather than hardware or software. Remove and redeploy boards that require software and/or hardware upgrades from board engineers on a regular cadence. Collaborate with system architects, chip and board designers, software/firmware engineers, HW/SW QA teams, and Applications engineering teams to drive design, development, debug, and release of next-generation products. Participate in procurement decisions for the lab, evaluate options, run proofs of concept, and provide recommendations. Collect data for critical metrics for the lab and track progress. What we need to see

Associates or Bachelor's Degree in a tech-related major or 4+ years of equivalent experience in a lab or datacenter environment. Ability to perform well at work without requiring constant supervision. Ability to deploy and cable servers and test equipment. Proven experience with data center infrastructure, including servers, storage systems, and networking equipment. Strong knowledge of hardware components. Basic user-level understanding of Unix/Windows, and networking with enterprise switches and routers. Ability to work with teammates of various abilities and experiences. Ability to identify tasks requiring input from sysadmins and coordinate to integrate solutions. Persistence to debug hard problems and out-of-the-box thinking to find solutions. Interest in working with close-knit, multi-disciplinary teams and hands-on work with state-of-the-art platforms. Ways to stand out from the crowd

Visio and CAD experience for Lab R&D projects and rack management. Lab/datacenter procurement experience. Experience handling PDUs and Power in labs. System administrator level experience on Unix/Windows and scripting to automate workflows (bash/python). Basic knowledge of Git/Perforce for version control of scripts. Ability to write SQL queries to retrieve data from MySQL databases. Sustainable Talent is an equal employment opportunity employer (M/F+, disabled, and veteran). We do not discriminate on the basis of protected status under applicable law.

#J-18808-Ljbffr