Logo
META

Senior Production Network Engineer

META, Menlo Park, California, United States, 94029

Save Job

Summary:

Are you ready to be part of a rapidly growing AI Training and Inference Infrastructure at Meta? We are looking for talented engineers to help us build, scale, and enhance our network infrastructure that is pivotal in connecting numerous GPUs for a wide range of AI use cases. We believe in creating simple, elegant, and scalable network designs, and your expertise in automation and data analytics will be crucial to meet the ever-growing demands. In this exciting role, you will join a dynamic team focused on designing innovative solutions and developing, testing, and deploying network software, systems, and tools that ensure maximum reliability, scalability, and efficiency of our Data Center network. As a hybrid software and network engineer, you will utilize your network engineering skills to research and design next-generation network architectures and leverage your software development capabilities to implement them at scale. Key Responsibilities: Collaborate with hardware, software, and vendor teams to create cutting-edge network topologies and platforms (switch and optics). Partner with in-house Software Engineers and Tooling, Planning, Simulation, and Delivery teams to codify network designs. Develop automated testing frameworks that are integrated in the Continuous Integration/Continuous Deployment pipeline to validate network hardware and software stacks for both in-house Facebook Open Switching System (FBOSS) and Vendor platforms. Perform thorough testing of complex network migration procedures in lab environments before executing in production. Work closely with hardware, software, and sourcing teams to develop innovative networking solutions that will influence the future of networking infrastructure. Be on-call to learn from real-world production challenges and use those experiences to drive improvements in current and future products. Minimum Qualifications: Bachelor's degree in Computer Science, Computer Engineering, or a relevant technical field, or equivalent practical experience. 6+ years of experience working with networks that support large-scale training workloads. Proficiency in designing, deploying, and operating datacenter networks at scale. Experience in coding with languages such as Python, C++, or Go. Familiarity with network automation software that utilizes software-defined networking principles. Experience in configuring and troubleshooting routing and switching protocols, including BGP, IS-IS, OSPF, MPLS, and RSVP-TE. Working knowledge of network protocols (TCP/UDP, DHCP, DNS) and experience with IPv4 and IPv6. Preferred Qualifications: Understanding of AI training workloads and the demands they place on networks. Knowledge of RDMA congestion control mechanisms on RoCE Networks. Familiarity with 40/100/400G Ethernet and CWDM, DWDM, and optical transport network technologies. Insight into different Optics and the internals of switch ASICs. Compensation:

The annual salary ranges from $147,000 to $208,000, plus bonus, equity, and benefits. Please note that Meta is an Equal Employment Opportunity and Affirmative Action employer, and we do not discriminate based on race, religion, color, national origin, sex, sexual orientation, gender identity, disability, or any other legally protected characteristics. We provide reasonable accommodations for candidates with disabilities throughout the application process. If you need assistance, please reach out to accommodations-ext@fb.com.