Logo
Ll Oefentherapie

Sr Software Engineer - Network Reliability Engineering - AI/ML

Ll Oefentherapie, Austin, Texas, us, 78716

Save Job

Description

Oracle Cloud Infrastructure (OCI) provides mission-critical cloud services to enterprises worldwide. The Network Reliability Engineering(NRE) Automation, Reporting, and Tooling team builds innovative solutions that boost the productivity and efficiency of the Global Network Operations Center (GNOC). Our tooling empowers the GNOC and Network Reliability Engineering (NRE) teams with observability, automation, and actionable insights at hyperscale. As a Sr Software Engineer, you will design, build, and deliver scalable automation frameworks and advanced platforms leveraging AI/ML to drive operational excellence across OCI0s global network. This includes building network event driven data (such as failures), hybrid classification, and both training and inference. You are passionate about developing software that solves real-world operational challenges, thrive in a fast-paced team, and are comfortable working with complex distributed systems. You value simplicity, scalability, and collaboration. Responsibilities

Architect, build, and support distributed systems for process control and execution based on Product Requirement Documents (PRDs). Develop and sustain DevOps tooling, new product process integrations and automated testing. Develop ML in Python 3; build backend services in Go (Golang); create command-line interface (CLI) tools in Rust or Python 3; and integrate with other services as needed using Go, Python 3, or C. Build and maintain schemas/models to ensure every platform and service write is captured for monitoring, debugging and compliance Build and maintain dashboards that monitor the quality and effectiveness of service execution for \"process as code\" your team delivers. Build automated systems that route code failures to the appropriate oncall engineers and service owners. Ensure high availability, reliability, and performance of developed solutions in production environments. Support serverless workflow development for workflows which call and utlize the above mentioned services support our GNOC, GNRE, and onsite operations and hardware support teams. Participate in code reviews, mentor peers, and help build a culture of engineering excellence. Operate in an Extreme Programming (XP) asynchronous environment (chat/tasks) without daily standups, and keep work visible by continuously updating task and ticket states in Jira. Required Qualifications

3 - 5+ years of experience in process as code, software engineering, automation development, or similar roles Bachelors in computer science and Engineering or related engineering fields Strong coding skills in Go and Python3 Experience with distributed systems, micro-services, and cloud-native technologies Proficiency in Linux environments and scripting languages Proficiency with database creation, maintenance and code using SQL and Go or Py3 libraries Understanding of network operations or large-scale IT infrastructure Excellent problem-solving, organizational, and communication skills Experience using AI coding assistants or AI-powered tools to help accelerate software development, including code generation, code review, or debugging. Preferred Qualifications

Process engineering experience (control systems, proportional integral derivative's (pid), statistical process control (SPC)) Proficiency with data modeling, data analysis, and reporting frameworks (e.g., SQL, Spark, Prometheus, Grafana, etc.) Experience with C, Cpp, Java, or Rust Experience developing automation and tools for network or scale cloud operations Background in creating dashboards, alerts, and real-time reporting platforms Familiarity with workflow automation (e.g., Apache Airflow), CI/CD pipelines, or infrastructure as code Previous experience supporting or building tools for (any) hyperscale or scale could network, compute, or storage operations. Knowledge of REST APIs, remote procedure calls (RPCs), and service oriented architectures (SOA) Familiarity with eXtreme programming (xp), agile, and devops process Experience with ticketing and version control systems (e.g., Jira, Git)

#J-18808-Ljbffr