Delaware Staffing
Software Developer 5 - OCI AI Platform
Delaware Staffing, Dover, Delaware, United States, 19904
Oracle Cloud Infrastructure Consulting Software Engineer
Oracle Cloud Infrastructure (OCI) is Oracle's next-generation cloud platform, engineered to handle the most demanding enterprise workloads. Within OCI, the AI Platform organization is building a comprehensive cloud service to support the full lifecycle of AI and machine learning from GPU infrastructure and training pipelines to model serving and deployment tools enabling Oracle teams and customers to build and deploy AI at scale. We are looking for a Consulting Software Engineer to join our growing team and help shape the future of AI infrastructure and services at Oracle. This role will focus on critical components of OCI's AI platform, including large-scale GPU cluster management, self-service ML infrastructure, end-to-end model lifecycle capabilities including training and serving. Help shape the core infrastructure powering Oracle's generative AI and machine learning solutions. Tackle some of the most challenging problems in AI infrastructure at enterprise scale. Collaborate with world-class teams and leaders driving innovation in cloud and AI. Be part of a high-visibility initiative central to Oracle's future. This role requires strong technical and leadership skills, with a deep understanding of cloud-native infrastructure, distributed systems, and modern AI/ML workloads. You will collaborate across OCI and Oracle's product teams to power internal and customer-facing AI solutions at scale. As a Consulting Software Engineer on the team, you will work with teams of software engineers responsible for the software design, development, and operations for our new and existing features. You should be able to architect broad systems interactions, be hands-on, be able to dive deep into any part of the stack and have a good sense of cloud infrastructure and networking knowledge. You should be able to work seamlessly in a collaborative, agile environment, and be excited to learn. IC5s work independently and provide technical leadership to the broader organization. You should have experience developing and operating high-scale services, and an understanding of how to make these cloud-scale services resilient, balance speed and quality with iterative and incremental improvements. Understand operational excellence and know-how to infuse a culture of being proactive within your team. Recommend and justify major changes to new and existing products and establish consensus with data-driven approaches. What You'll Do Build cloud service on top of the modern Infrastructure as a Service (IaaS) building blocks at OCI Design and build distributed, scalable, fault tolerant software systems Participate in the entire software lifecycle development, testing, CI and production operations Design and lead software projects without needing significant guidance and guide/mentor/coach junior engineers Balance between product feature development and production operational concerns like writing runbooks, ops automation, structured logging, instrumentation for metrics and events Leverage internal tooling at OCI to develop, build, deploy and troubleshoot software Participate in on-call for the service with the team Qualifications 12+ years of experience shipping scalable, cloud native distributed systems Experience with building multi-tenant Kubernetes and security isolation Built Kubernetes controllers, CRDs, and admission webhooks to automate lifecycle management of AI/ML workloads Implement advanced optimizations: distributed and disaggregated inference serving, multi-node inference, KV-cache reuse Build intelligent request routing and adaptive scheduling to maximize GPU utilization Experience inference solutions like: Nvidia Dynamo, vLLM, Ray Serve Experience with production operations and best practices for putting quality code in production and troubleshoot issues when they arise Able to effectively communicate technical ideas verbally and in writing (technical proposals, design specs, architecture diagrams and presentations) Experience in Go, Java, Python Preferred Qualifications MS in Computer Science Experience building control plane/data plane solutions for cloud native companies Experience in diagnosing, troubleshooting and resolving performance issues in complex environments Deep understanding of Unix-like operating systems Production experience with Cloud and ML technologies Generative AI, LLM, Machine learning experience Disclaimer: Certain US customer or client-facing roles may be required to comply with applicable requirements, such as immunization and occupational health mandates. Range and benefit information provided in this posting are specific to the stated locations only. US: Hiring Range in USD from: $96,800 to $251,600 per annum. May be eligible for bonus, equity, and compensation deferral. Oracle maintains broad salary ranges for its roles in order to account for variations in knowledge, skills, experience, market conditions and locations, as well as reflect Oracle's differing products, industries and lines of business. Candidates are typically placed into the range based on the preceding factors as well as internal peer equity. Oracle US offers a comprehensive benefits package which includes the following: medical, dental, and vision insurance, including expert medical opinion; short term disability and long term disability; life insurance and AD&D supplemental life insurance (Employee/Spouse/Child); health care and dependent care Flexible Spending Accounts; pre-tax commuter and parking benefits; 401(k) Savings and Investment Plan with company match; paid time off: Flexible Vacation is provided to all eligible employees assigned to a salaried (non-overtime eligible) position. Accrued Vacation is provided to all other employees eligible for vacation benefits. For employees working at least 35 hours per week, the vacation accrual rate is 13 days annually for the first three years of employment and 18 days annually for subsequent years of employment. Vacation accrual is prorated for employees working between 20 and 34 hours per week. Employees working fewer than 20 hours per week are not eligible for vacation. 11 paid holidays; paid sick leave: 72 hours of paid sick leave upon date of hire. Refreshes each calendar year. Unused balance will carry over each year up to a maximum cap of 112 hours. Paid parental leave; adoption assistance; Employee Stock Purchase Plan; financial planning and group legal; voluntary benefits including auto, homeowner and pet insurance.
Oracle Cloud Infrastructure (OCI) is Oracle's next-generation cloud platform, engineered to handle the most demanding enterprise workloads. Within OCI, the AI Platform organization is building a comprehensive cloud service to support the full lifecycle of AI and machine learning from GPU infrastructure and training pipelines to model serving and deployment tools enabling Oracle teams and customers to build and deploy AI at scale. We are looking for a Consulting Software Engineer to join our growing team and help shape the future of AI infrastructure and services at Oracle. This role will focus on critical components of OCI's AI platform, including large-scale GPU cluster management, self-service ML infrastructure, end-to-end model lifecycle capabilities including training and serving. Help shape the core infrastructure powering Oracle's generative AI and machine learning solutions. Tackle some of the most challenging problems in AI infrastructure at enterprise scale. Collaborate with world-class teams and leaders driving innovation in cloud and AI. Be part of a high-visibility initiative central to Oracle's future. This role requires strong technical and leadership skills, with a deep understanding of cloud-native infrastructure, distributed systems, and modern AI/ML workloads. You will collaborate across OCI and Oracle's product teams to power internal and customer-facing AI solutions at scale. As a Consulting Software Engineer on the team, you will work with teams of software engineers responsible for the software design, development, and operations for our new and existing features. You should be able to architect broad systems interactions, be hands-on, be able to dive deep into any part of the stack and have a good sense of cloud infrastructure and networking knowledge. You should be able to work seamlessly in a collaborative, agile environment, and be excited to learn. IC5s work independently and provide technical leadership to the broader organization. You should have experience developing and operating high-scale services, and an understanding of how to make these cloud-scale services resilient, balance speed and quality with iterative and incremental improvements. Understand operational excellence and know-how to infuse a culture of being proactive within your team. Recommend and justify major changes to new and existing products and establish consensus with data-driven approaches. What You'll Do Build cloud service on top of the modern Infrastructure as a Service (IaaS) building blocks at OCI Design and build distributed, scalable, fault tolerant software systems Participate in the entire software lifecycle development, testing, CI and production operations Design and lead software projects without needing significant guidance and guide/mentor/coach junior engineers Balance between product feature development and production operational concerns like writing runbooks, ops automation, structured logging, instrumentation for metrics and events Leverage internal tooling at OCI to develop, build, deploy and troubleshoot software Participate in on-call for the service with the team Qualifications 12+ years of experience shipping scalable, cloud native distributed systems Experience with building multi-tenant Kubernetes and security isolation Built Kubernetes controllers, CRDs, and admission webhooks to automate lifecycle management of AI/ML workloads Implement advanced optimizations: distributed and disaggregated inference serving, multi-node inference, KV-cache reuse Build intelligent request routing and adaptive scheduling to maximize GPU utilization Experience inference solutions like: Nvidia Dynamo, vLLM, Ray Serve Experience with production operations and best practices for putting quality code in production and troubleshoot issues when they arise Able to effectively communicate technical ideas verbally and in writing (technical proposals, design specs, architecture diagrams and presentations) Experience in Go, Java, Python Preferred Qualifications MS in Computer Science Experience building control plane/data plane solutions for cloud native companies Experience in diagnosing, troubleshooting and resolving performance issues in complex environments Deep understanding of Unix-like operating systems Production experience with Cloud and ML technologies Generative AI, LLM, Machine learning experience Disclaimer: Certain US customer or client-facing roles may be required to comply with applicable requirements, such as immunization and occupational health mandates. Range and benefit information provided in this posting are specific to the stated locations only. US: Hiring Range in USD from: $96,800 to $251,600 per annum. May be eligible for bonus, equity, and compensation deferral. Oracle maintains broad salary ranges for its roles in order to account for variations in knowledge, skills, experience, market conditions and locations, as well as reflect Oracle's differing products, industries and lines of business. Candidates are typically placed into the range based on the preceding factors as well as internal peer equity. Oracle US offers a comprehensive benefits package which includes the following: medical, dental, and vision insurance, including expert medical opinion; short term disability and long term disability; life insurance and AD&D supplemental life insurance (Employee/Spouse/Child); health care and dependent care Flexible Spending Accounts; pre-tax commuter and parking benefits; 401(k) Savings and Investment Plan with company match; paid time off: Flexible Vacation is provided to all eligible employees assigned to a salaried (non-overtime eligible) position. Accrued Vacation is provided to all other employees eligible for vacation benefits. For employees working at least 35 hours per week, the vacation accrual rate is 13 days annually for the first three years of employment and 18 days annually for subsequent years of employment. Vacation accrual is prorated for employees working between 20 and 34 hours per week. Employees working fewer than 20 hours per week are not eligible for vacation. 11 paid holidays; paid sick leave: 72 hours of paid sick leave upon date of hire. Refreshes each calendar year. Unused balance will carry over each year up to a maximum cap of 112 hours. Paid parental leave; adoption assistance; Employee Stock Purchase Plan; financial planning and group legal; voluntary benefits including auto, homeowner and pet insurance.