Logo
Ll Oefentherapie

Principal Network Reliability Engineer

Ll Oefentherapie, Boise, Idaho, United States, 83708

Save Job

Principal Network Reliability Engineer About the Role Oracle Cloud Infrastructure’s NRE team ensures the global OCI network — spanning thousands of routers, optical systems, and backbone links — achieves the highest levels of availability, resiliency, and automation maturity.

We are seeking a Principal NRE to design and evolve OCI’s autonomous network operations platform — integrating advanced AI/ML systems, telemetry pipelines, and automation frameworks to detect, predict, and remediate issues before they impact customers.

This is a hands‑on technical leadership role for engineers who thrive at the intersection of network architecture, distributed systems, and applied machine intelligence.

Qualifications

Bachelor’s or Master’s in Computer Science, or a related technical discipline.

10+ years in large-scale network operations or reliability engineering, preferably in hyperscale cloud or carrier environments.

Proven ability to define and improve SLIs/SLOs, MTTR, MTTD, automation coverage, and change reliability metrics to achieve 99.99%+ network availability.

Expert‑level knowledge of:

Core protocols: IPv4/6, BGP, OSPF, IS‑IS, MPLS, EVPN, VxLAN, RSVP‑TE.

Network architecture (DC spine‑leaf, WAN, backbone, edge).

Telemetry, observability, and data collection systems.

Proven hands‑on experience in automation frameworks, scripting (Python, Go, Bash), and infrastructure as code (Terraform, Ansible, or equivalent).

Familiarity with AI/ML applications in network operations — anomaly detection, predictive analytics, clustering, or LLMs for diagnostics.

Deep understanding of service reliability concepts (SLOs, error budgets, MTTR, MTTD, change failure rate).

Experience with major vendor platforms: Juniper (MX, QFX, PTX), Arista, and Cisco NX‑OS.

Ability to lead cross‑functional technical projects from concept through global deployment.

Experience leading technical projects in both network and software engineering disciplines.

Preferred Experience

Experience designing data pipelines for telemetry, flow logs, and metrics aggregation.

Exposure to AIOps or ML‑Ops frameworks (Kubeflow, SageMaker, OCI Data Science).

Involvement in network simulation/emulation systems for failure modeling (Batfish, NS‑3, Mininet).

Demonstrated leadership in cross‑domain reliability initiatives (NRE, SRE, GNOC, or Platform Ops).

#J-18808-Ljbffr