Oracle
Overview
Site Reliability Developer (JoinOCI-Ns2) — Oracle. Teams located in Reston, VA; Austin, TX; and Seattle, WA. Responsibilities
Solve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence. Design, write, and deploy software to improve availability, scalability, and efficiency of Oracle products and services. Design and develop architectures, standards, and methods for large-scale distributed systems. Facilitate service capacity planning and demand forecasting, software performance analysis, and system tuning. Collaborate with the Site Reliability Engineering (SRE) team on shared full-stack ownership of a collection of services and/or technology areas. Understand end-to-end configuration, dependencies, and behavior of production services. Design and deliver mission-critical stack with emphasis on security, resiliency, scale, and performance. Provide end-to-end performance and operability accountability. Partner with development teams to define and implement improvements in service architecture and communicate technical characteristics to guide development toward premier capabilities in the Oracle Cloud service portfolio. Understand the scale, capacity, security, performance attributes, and requirements of the stack. Demonstrate understanding of automation and orchestration principles. Act as escalation point for complex or critical issues not documented as SOPs. Use knowledge of service topology and dependencies to troubleshoot issues and define mitigations. Explain how product architecture decisions affect distributed systems. Maintain professional curiosity and desire to develop deep understanding of services and technologies. Spend significant time on ops work and also on software engineering tasks to increase reliability and automation. Proficient programmer with breadth of knowledge in networking, internet protocols, and Linux systems. Work with service owners and teammates in a DevOps driven culture to deliver stable, scalable services while enabling agility for feature development. Develop and maintain tools for operational efficiency, incident response, and automation; create and improve runbooks to reduce mean time to triage. Contribute to standard practices and procedures for the team and organization, and drive improvements in service architecture and operability. Deliver solutions that directly contribute to customer success. Minimum Qualifications
Active TS/SCI clearance; Polygraph preferred; must be able to obtain and maintain an active Polygraph. BS in Computer Science or related technical field or equivalent practical experience. Proficient with writing automation scripts in Python, Bash, Ruby, Perl, JavaScript, or Java. Strong written and verbal communication skills. Familiarity with core protocols (DNS, DHCP, HTTP, TCP); deep knowledge of Linux internals and host-based networking; expert Linux/Unix performance and stability troubleshooting. Familiarity with configuration management (Chef, Puppet) and monitoring solutions for large scale environments. Experience with databases (Oracle DB, MySQL, Postgres) and shared file systems (Gluster, ZFS). Systems thinking with a systematic problem-solving approach; strong ownership and drive; ability to develop service metrics, dashboards, and alarms. Experience in an operational environment with mission-critical tier-one services and on-call duties. 5+ years of experience running large-scale, highly distributed services; 2+ years of managing host virtualization technologies (KVM, Containers, Docker, etc.). Preferred Qualifications
Proficient in coding distributed systems using Python, Ruby, Java, or C/C++. Deep knowledge of networking (TCP, UDP, DNS, DHCP, IPSec). Strong focus on building secure Internet-facing systems in hostile environments. 3+ years production software development with Agile methodologies; 3+ years managing host, network, or storage virtualization technologies. Expert troubleshooting skills and fleet automation and management experience. Job Details
Seniority level: Mid-Senior level Employment type: Full-time Job function: Engineering and IT Industries: IT Services and IT Consulting Additional Information
Disclaimer: Certain US customer or client-facing roles may require immunization and occupational health mandates. Salary range and benefits information is location-specific. US Hiring Range: 79,100 - 158,200 per year, with potential bonus and equity. Oracle offers a comprehensive benefits package including medical, dental, vision, disability, life and AD&D insurance; 401(k) with company match; paid time off; holidays; sick leave; parental leave; adoption assistance; stock purchase plan; and voluntary benefits. About Us
Oracle is a world leader in cloud solutions. Oracle is an Equal Employment Opportunity employer. If you require accessibility assistance, contact accommodation-request_mb@oracle.com or +1 888 404 2494 in the United States.
#J-18808-Ljbffr
Site Reliability Developer (JoinOCI-Ns2) — Oracle. Teams located in Reston, VA; Austin, TX; and Seattle, WA. Responsibilities
Solve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence. Design, write, and deploy software to improve availability, scalability, and efficiency of Oracle products and services. Design and develop architectures, standards, and methods for large-scale distributed systems. Facilitate service capacity planning and demand forecasting, software performance analysis, and system tuning. Collaborate with the Site Reliability Engineering (SRE) team on shared full-stack ownership of a collection of services and/or technology areas. Understand end-to-end configuration, dependencies, and behavior of production services. Design and deliver mission-critical stack with emphasis on security, resiliency, scale, and performance. Provide end-to-end performance and operability accountability. Partner with development teams to define and implement improvements in service architecture and communicate technical characteristics to guide development toward premier capabilities in the Oracle Cloud service portfolio. Understand the scale, capacity, security, performance attributes, and requirements of the stack. Demonstrate understanding of automation and orchestration principles. Act as escalation point for complex or critical issues not documented as SOPs. Use knowledge of service topology and dependencies to troubleshoot issues and define mitigations. Explain how product architecture decisions affect distributed systems. Maintain professional curiosity and desire to develop deep understanding of services and technologies. Spend significant time on ops work and also on software engineering tasks to increase reliability and automation. Proficient programmer with breadth of knowledge in networking, internet protocols, and Linux systems. Work with service owners and teammates in a DevOps driven culture to deliver stable, scalable services while enabling agility for feature development. Develop and maintain tools for operational efficiency, incident response, and automation; create and improve runbooks to reduce mean time to triage. Contribute to standard practices and procedures for the team and organization, and drive improvements in service architecture and operability. Deliver solutions that directly contribute to customer success. Minimum Qualifications
Active TS/SCI clearance; Polygraph preferred; must be able to obtain and maintain an active Polygraph. BS in Computer Science or related technical field or equivalent practical experience. Proficient with writing automation scripts in Python, Bash, Ruby, Perl, JavaScript, or Java. Strong written and verbal communication skills. Familiarity with core protocols (DNS, DHCP, HTTP, TCP); deep knowledge of Linux internals and host-based networking; expert Linux/Unix performance and stability troubleshooting. Familiarity with configuration management (Chef, Puppet) and monitoring solutions for large scale environments. Experience with databases (Oracle DB, MySQL, Postgres) and shared file systems (Gluster, ZFS). Systems thinking with a systematic problem-solving approach; strong ownership and drive; ability to develop service metrics, dashboards, and alarms. Experience in an operational environment with mission-critical tier-one services and on-call duties. 5+ years of experience running large-scale, highly distributed services; 2+ years of managing host virtualization technologies (KVM, Containers, Docker, etc.). Preferred Qualifications
Proficient in coding distributed systems using Python, Ruby, Java, or C/C++. Deep knowledge of networking (TCP, UDP, DNS, DHCP, IPSec). Strong focus on building secure Internet-facing systems in hostile environments. 3+ years production software development with Agile methodologies; 3+ years managing host, network, or storage virtualization technologies. Expert troubleshooting skills and fleet automation and management experience. Job Details
Seniority level: Mid-Senior level Employment type: Full-time Job function: Engineering and IT Industries: IT Services and IT Consulting Additional Information
Disclaimer: Certain US customer or client-facing roles may require immunization and occupational health mandates. Salary range and benefits information is location-specific. US Hiring Range: 79,100 - 158,200 per year, with potential bonus and equity. Oracle offers a comprehensive benefits package including medical, dental, vision, disability, life and AD&D insurance; 401(k) with company match; paid time off; holidays; sick leave; parental leave; adoption assistance; stock purchase plan; and voluntary benefits. About Us
Oracle is a world leader in cloud solutions. Oracle is an Equal Employment Opportunity employer. If you require accessibility assistance, contact accommodation-request_mb@oracle.com or +1 888 404 2494 in the United States.
#J-18808-Ljbffr