Logo
Ll Oefentherapie

Sr Software Engineer - AI Infrastructure

Ll Oefentherapie, Santa Clara, California, us, 95053

Save Job

Overview

Job Title:

Senior Software Engineer - AI Infrastructure Oracle Cloud Infrastructure (OCI) is looking for a Senior Software Engineer to lead the development of scalable, resilient, and secure infrastructure systems that underpin the core of OCIs compute platform. This role sits within the Host Provisioning Services (HoPS) team, which owns the critical infrastructure responsible for automating the full server lifecycle from rack integration and hardware bring-up to customer-ready instance provisioning and firmware management. HoPS services operate at the intersection of bare metal hardware and full-stack orchestration frameworks, interfacing with components like BMCs, NICs, SmartNICs, ILOMs, GPUs, and custom firmware stacks. The team builds microservices and tooling that provision, configure, secure, and validate server platforms across OCIs global fleet. Responsibilities Design and deliver highly available services and automation pipelines that manage server provisioning at hyperscale. Enable firmware pinning for deterministic customer environments and deliver fleet-wide firmware updates and telemetry-based observability. Develop solutions to support new silicon platforms (e.g., NVIDIA, AMD, Intel) and SmartNIC/HostNIC convergence. Advance RoT (Root of Trust) security integration and evolve OCIs infrastructure toward next-generation clusters and composable hardware environments. Partner with Compute, Networking, Security, Datacenter Engineering, and Hardware Development teams to launch, scale, and maintain new server platforms with high reliability and low operational overhead.

Qualifications

Experience as a systems engineer with a deep understanding of operating systems, hardwaresoftware integration, distributed services, and cloud-scale automation. Familiarity with server hardware, firmware, BMCs, NICs, GPUs, and related management stacks. Ability to design and operate at hyperscale with focus on reliability, observability, and security.

This role is ideal for professionals who are able to work across multiple teams to enable OCI to launch, scale, and maintain new server platforms with minimal operational overhead and high reliability. #J-18808-Ljbffr