Reveille Technologies

Network Engineer

Reveille Technologies, Santa Clara, California, us, 95053

Top Skills: Data Center & AI Cluster Networking: High-performance interconnects – GPU, HPC, AI clusters; InfiniBand, Ultra Ethernet, ROCEv2, DCQCN; Dark Fiber / Carrier Interconnect Optimization; Hybrid DC Network Architecture & Fabric Design.

Job Description / Responsibilities This is a hands‑on network engineering position focused on the architecture, design, development and deployment of ultra‑high‑speed, resilient and scalable DC AI Clusters and Interconnects for GPU‑accelerated data centers and compute clusters.

Outstanding problem‑solving abilities and a comprehensive understanding of the network security protocols & standards, routing, switching, automation and deep understanding of fundamental network theory is also critical to your success.

Lead the architecture, design, and deployment of global‑scale DC inter‑connects and fabric for HPC, AI, and GPU computing clusters.

Develop high‑performance data center fabric using InfiniBand, Ultra Ethernet and related technologies.

Optimize carrier interconnects, intra and inter DC routing, and dark fiber deployments to ensure low latency and high reliability.

Partner with system, OS, GPU, and HPC teams to deliver scalable, highly available networks for extreme‑performance workloads.

Implement network monitoring, telemetry, solving and continuous performance improvement processes.

Drive technology selection, vendor engagement and lifecycle management for Data Center hardware and software.

What We re Looking For Minimum 6-8 years of experience in building, managing and supporting large scale hybrid networks, developing automation pipelines with Python, Ruby, Go or other languages used in infrastructure automation.

SME in networking technologies: InfiniBand, Ultra Ethernet, ROCEv2, DCQCN, TCP/UDP, IPv4/IPv6, BGP/MP-BGP, VPN, L2 switching, EVPN, VxLAN, Segment Routing, MPLS.

Experience automating network infrastructure using an automated configuration management system (Python, Ansible, Salt, etc.).

#J-18808-Ljbffr