Logo
Oregon Staffing

AI & High Performance Computing (HPC) Network Engineer

Oregon Staffing, Beaverton, Oregon, us, 97078

Save Job

Network Engineer

We are looking for a Network Engineer to design, deploy, and troubleshoot high-throughput, low-latency networks that support large-scale AI training and inference workloads. In this role, you'll work at the intersection of high-performance computing and hyperscale infrastructure, enabling next-generation AI systems through robust and scalable networking solutions. You'll leverage advanced networking protocols and technologiessuch as RoCEv2, Infiniband, and smartNICswhile applying foundational expertise in L2/L3 networking, telemetry, and diagnostics. A strong background in Linux-based systems, network automation (Python, Ansible, Terraform), and performance tuning is essential. This is a high-impact opportunity to shape the backbone of AI infrastructure at hyperscale. Responsibilities: Design and deploy high-throughput, low-latency network systems to support AI training and inference infrastructure. Troubleshoot and optimize Linux-based networking stacks across thousands of nodes. Tune Linux kernel networking parameters for performance. Automate network provisioning, monitoring, and diagnostics using Python, Ansible, Terraform, or equivalent tools. Implement and manage L2/L3 network topologies, including EVPN-VLAN, BGP, OSPF, and static routing. Support and troubleshoot Infiniband, RoCEv2, SR-IOV, and/or smartNIC-based deployments. Analyze performance metrics to identify and resolve packet loss, congestion, jitter, and other network anomalies. Integrate telemetry and logging solutions using tools such as Prometheus, Grafana, and sFlow/NetFlow. Collaborate with security and platform teams to enforce network segmentation, ACLs, and policy enforcement. Participate in design reviews, capacity planning, and incident response for critical infrastructure. Travel may be required for this role. The amount of travel will vary from 25% to 100% depending on business need and client requirements. Basic Qualifications: Minimum 5+ years of experience as a Network Engineer working in Linux-dominant environments. Strong understanding of TCP/IP stack, multicast, DNS, DHCP, NAT, and QoS. Hands-on experience with network configuration on Linux systems (e.g., Netplan, systemd-networkd, NetworkManager). Proven skills in scripting and automation (e.g., Python, Bash, Git). Experience deploying and managing enterprise-grade switches, routers, and NICs (e.g., Arista, Juniper, Mellanox, Broadcom). Ability to troubleshoot across physical, data link, and network layers using tools like tcpdump, iperf, ethtool, nmap, etc. Bachelor's degree or equivalent (minimum 12 years) work experience. (If Associate's Degree, must have minimum 6 years work experience) Preferred Qualifications: Bachelor's or Master's degree in Computer Science, Electrical Engineering, or equivalent experience. Experience in hyperscale or high performance compute infrastructure, supporting AI/ML or cloud-scale services. Familiarity with network and HPC monitoring/provisioning tools used by hyperscale platforms at scale. Understanding of network security principles, including firewalls, VPNs, microsegmentation, and secure provisioning. Experience with zero-touch provisioning, out-of-band management, and network bootstrap pipelines.