Logo
Oracle

Sr Hardware Developer, GPU/AI and Compute

Oracle, Santa Clara, California, us, 95053

Save Job

Oracle hardware development engineering, within Oracle’s Cloud Infrastructure development, is seeking a highly driven GPU Platform Hardware Engineer at the Senior Engineer level. The GPU Hardware Engineer will work within development engineering with a small team of talented engineers who lead the development and day-to-day engineering efforts for Oracle’s rapidly growing and successful Cloud AI platforms. You will participate in hardware development oversight & in house development, design reviews, Hardware integration, debug, and performance testing. You will interact closely with third party GPU IC suppliers & partners as well as internal hardware and software development engineers. You will be a critical part of the team developing Oracle’s growing Cloud AI solutions. Responsibilities

Review and assessment of third-party merchant silicon. Evaluation of system architecture and proposed implementation path analysis. Participate in platform definition and analysis. Provide platform development oversight for partners. Work with in-house engineering functional experts on design and reviews. Support system integration, performance testing, debug and characterization. Support program managers on technical assessments. Interact closely with third party GPU IC suppliers & partners as well as internal hardware, software development, quality assurance, cloud orchestration, hardware and software security experts, and Oracle manufacturing teams. Document and specify design intent and design details where appropriate in collaboration with the appropriate engineering teams. Participate in hardware platform security evaluations. Guide partner internal Oracle teams on support needed to scale, monitor, and successfully deploy our products to the Cloud. Assist Oracle Cloud and Support teams in the root-cause of potential hardware or software bugs through firsthand lab replication debug, remote debug, and calls with the appropriate teams supporting our deployed products. Work with Oracle manufacturing teams to ensure that Oracle hardware is secure, robustly evaluated, performing at peak capabilities and well qualified for deployment to our Cloud customers. Required Qualifications

Technical hands-on experience with market leading GPU (or alternate AI platforms) from the hardware and platform development, test, and characterization perspectives. Good knowledge of AI / GPU platform architecture and their capabilities. A strong understanding and experience running firmware and system diagnostics tools using BMC firmware, UEFI/ BIOS and Linux tools. Skilled in scripting to customize tests. Demonstrated working experience with GPU supplier test code as well as open-source AI test / characterization tools. Experience with design, and implementation of modern server platforms consisting of multiple architectures and vendors, including x86 and ARM server architectures. Experience with hardware development at the board, and FPGA level. Required experience with board ECAD level tools and ability to reviews hierarchical schematics, multilayer advance board layout, cross board interconnect and end-to-end connectivity analysis. Strong communications skills and ability to clearly communicate complex technical issue across engineering disciplines as well as clearly and succinctly articulate issues for executives. Demonstrated experience debugging and root-causing complex issues that may have a mix of hardware and software causes. Experience with early stage bring-up and power-on, platform firmware debugging, prototype GPU & CPU complex and memory complex debugging. An ability to isolate a problem to the source and the required creativity & expertise to devise timely and robust solutions. Experience and understanding of the latest high-speed busses and interconnect used in modern Compute and AI platforms. Familiarity with their startup connectivity and operational robustness. Preferred Qualifications

Demonstrated knowledge of "low-level" hardware component interfaces, including, but not limited to, e.g.: PCIe, SPI, I2C (incl. SMBus, PMBus), LPC, eSPI, etc. Comfortable with the use of hardware debuggers, O’Scopes, and advanced Signal characterization measurement tools. Experience with platform level security technologies present an advantage in the role. Oracle is an equal opportunity employer and is committed to diversity and inclusion. We are committed to providing a work environment that is free from discrimination and harassment.

#J-18808-Ljbffr