Logo
Oracle

Principal Software Developer, GPU/AI and Compute Platforms

Oracle, Santa Clara, California, us, 95053

Save Job

Oracle hardware development engineering, within Oracle’s Cloud Infrastructure development, is seeking a highly driven GPU Platform Hardware Engineer at the Senior Engineer level. The GPU Hardware Engineer will work within development engineering with a small team of talented engineers who lead the development and day-to-day engineering efforts for Oracle’s rapidly growing and successful Cloud AI platforms. Position Overview

Our Design Engineering organization is looking for a highly driven, capable, and dedicated Principal Engineer to join the team developing the next generation AI platform for Cloud. Our products feed solutions into the growing and successful Oracle Cloud for Compute, AI Storage, and networking. Responsibilities

You will be responsible for, and not limited to: Review and assessment of third-party merchant silicon. Evaluation of system architecture and proposed implementation path analysis. Participate in platform definition and analysis. Provide platform development oversight for partners. Work with in-house engineering functional experts on design and reviews. Support system integration, performance testing, debug and characterization. Support program managers on technical assessments. Interact closely with third party GPU IC suppliers & partners as well as internal hardware, software development, quality assurance, cloud orchestration, hardware and software security experts, and Oracle manufacturing teams. Document and specify design intent and design details where appropriate in collaboration with the appropriate engineering teams. Participate in hardware platform security evaluations. Guide partner internal Oracle teams on support needed to scale, monitor, and successfully deploy our products to the Cloud. Assist Oracle Cloud and Support teams in the root-cause of potential hardware or software bugs through firsthand lab replication debug, remote debug, and calls with the appropriate teams supporting our deployed products. Work with Oracle manufacturing teams to ensure that Oracle hardware is secure, robustly evaluated, performing at peak capabilities and well qualified for deployment to our Cloud customers. What This Role Looks Like

Work directly with

hardware design and development teams on architecture, implementation, development, deployment, and troubleshooting of AI hardware platforms. Develop, implement, and run the day-to-day execution of

AI platform development, both internally and in partnership with third-party design teams. Work closely and collaborate with

hardware developers, System architects, System engineers, technical leads, platform firmware developers, partners and AI chip / GPU suppliers, storage, networking and compute experts, on product development and then with Manufacturing and external suppliers assisting across the new product introduction process out to production. Required Qualifications

Technical hands-on experience with market leading GPU (or alternate AI platforms) from the hardware and platform development, test, and characterization perspectives. Good knowledge of AI / GPU platform architecture and their capabilities. A strong understanding and experience running firmware and system diagnostics tools using BMC firmware, UEFI/ BIOS and Linux tools. Demonstrated working experience with GPU supplier test code as well as open-source AI test / characterization tools. Experience with design, and implementation of modern server platforms consisting of multiple architectures and vendors, including x86 and ARM server architectures. Experience with hardware development at the board, and FPGA level. Required experience with board ECAD level tools and ability to reviews hierarchical schematics, multilayer advance board layout, cross board interconnect and end-to-end connectivity analysis. Strong communications skills and ability to clearly communicate complex technical issue across engineering disciplines as well as clearly and succinctly articulate issues for executives. Demonstrated experience debugging and root-causing complex issues that may have a mix of hardware and software causes. Experience with early stage bring-up and power-on, platform firmware debugging, prototype GPU & CPU complex and memory complex debugging. An ability to isolate a problem to the source and the required creativity & expertise to devise timely and robust solutions. Experience and understanding of the latest high-speed busses and interconnect used in modern Compute and AI platforms. Preferred Qualifications

Demonstrated knowledge of low-level hardware component interfaces, including, but not limited to, e.g.: PCIe, SPI, I2C (incl. SMBus, PMBus), LPC, eSPI, etc. Comfortable with the use of hardware debuggers, O’Scopes, and advanced Signal characterization measurement tools. Experience with platform level security technologies present an advantage in the role. Certain US customer or client-facing roles may be required to comply with applicable requirements, such as immunization and occupational health mandates. Range and benefit information provided in this posting are specific to the stated locations only US: Hiring Range in USD from: $96,800 to $223,400 per annum. May be eligible for bonus and equity. Oracle US offers a comprehensive benefits package which includes medical, dental, and vision insurance, including expert medical opinion, short term disability and long term disability, life insurance and AD&D, supplemental life insurance, health care and dependent care Flexible Spending Accounts, pre-tax commuter and parking benefits, 401(k) Savings and Investment Plan with company match, paid time off, paid holidays, paid sick leave, paid parental leave, adoption assistance, Employee Stock Purchase Plan, financial planning and group legal, and voluntary benefits including auto, homeowner and pet insurance.

#J-18808-Ljbffr