Oracle
Distributed Systems Engineer – AI Infrastructure
Oracle Container Infrastructure (OCI) is building the world’s largest AI clusters and the fastest market for GPU‑focused cloud solutions. The AI Infrastructure organization leads this effort by delivering highly secure, reliable, and scalable GPU compute services. This role is an opportunity to design and optimize the control and data plane of AI infrastructure for a global market.
Responsibilities
Design and develop scalable AI compute infrastructure, focusing on GPU control plane and GPU data plane to enhance customer experience and workload performance.
Devise “best‑in‑class” AI compute services that are modular, secure, reliable, diagnosable, actively monitored, compliant, and reusable.
Collaborate across development, operations, and product management to understand requirements and design orchestration solutions.
Mentor junior developers and champion modern engineering practices, including telemetry‑driven decisions, well‑defined component interfaces, design reviews, coding standards, and comprehensive testing.
Develop benchmark metrics and automation to drive performance and reliability across customer workloads and underlying infrastructure.
Qualifications
BS (or equivalent) in Computer Science, Engineering, or related field.
6+ years of software development experience with C, C++, C#, Java, Go, or Rust.
3+ years designing and developing large‑scale infrastructure, distributed systems, and services.
1+ year providing technical leadership and clarity across cross‑functional teams.
Strong problem‑solving, communication, ownership, and drive.
Adaptability to fast‑paced, dynamic environments and effective multitasking.
Preferred Qualifications
Experience managing cloud infrastructure with hundreds of thousands of servers.
Experience with Docker and Kubernetes.
Experience scheduling high‑performance workloads on Kubernetes or Slurm.
Benefits and Compensation
Medical, dental, and vision insurance.
Short‑term and long‑term disability.
Life insurance and AD&D.
Supplemental life insurance for employees/spouse/child.
Health, dependent care, and commuter flexible spending accounts.
Pre‑tax commuter and parking benefits.
401(k) with company match.
Paid time off: flexible vacation, sick leave, parental leave, adoption assistance.
Employee Stock Purchase Plan.
Financial planning and group legal.
Voluntary benefits including auto, homeowner, and pet insurance.
EEO Statement Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans’ status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.
#J-18808-Ljbffr
Responsibilities
Design and develop scalable AI compute infrastructure, focusing on GPU control plane and GPU data plane to enhance customer experience and workload performance.
Devise “best‑in‑class” AI compute services that are modular, secure, reliable, diagnosable, actively monitored, compliant, and reusable.
Collaborate across development, operations, and product management to understand requirements and design orchestration solutions.
Mentor junior developers and champion modern engineering practices, including telemetry‑driven decisions, well‑defined component interfaces, design reviews, coding standards, and comprehensive testing.
Develop benchmark metrics and automation to drive performance and reliability across customer workloads and underlying infrastructure.
Qualifications
BS (or equivalent) in Computer Science, Engineering, or related field.
6+ years of software development experience with C, C++, C#, Java, Go, or Rust.
3+ years designing and developing large‑scale infrastructure, distributed systems, and services.
1+ year providing technical leadership and clarity across cross‑functional teams.
Strong problem‑solving, communication, ownership, and drive.
Adaptability to fast‑paced, dynamic environments and effective multitasking.
Preferred Qualifications
Experience managing cloud infrastructure with hundreds of thousands of servers.
Experience with Docker and Kubernetes.
Experience scheduling high‑performance workloads on Kubernetes or Slurm.
Benefits and Compensation
Medical, dental, and vision insurance.
Short‑term and long‑term disability.
Life insurance and AD&D.
Supplemental life insurance for employees/spouse/child.
Health, dependent care, and commuter flexible spending accounts.
Pre‑tax commuter and parking benefits.
401(k) with company match.
Paid time off: flexible vacation, sick leave, parental leave, adoption assistance.
Employee Stock Purchase Plan.
Financial planning and group legal.
Voluntary benefits including auto, homeowner, and pet insurance.
EEO Statement Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans’ status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.
#J-18808-Ljbffr