Celestica Inc.
Staff Test Engineer, AI Data Center Infrastructure (Austin)
Celestica Inc., Granite Heights, Wisconsin, United States
Date:
Oct 13, 2025
Location:
Merrimack, TX, US
Job Title:
Staff Engineer, Software
Functional Area:
Engineering (ENG)
Career Stream:
Design - Software Engineering
Job Code:
SEN-ENG-DSE
Job Band:
10
Direct/Indirect Indicator:
Indirect
Summary
Define and implement test strategies for all storage and server components, including hardware, firmware, and software.
Lead the definition and development of holistic test strategies, test plans, and test cases for complex data center network solutions, including Layer 2/3, SDN, DCN.
Design, execute, and analyze complex test cases for functional, performance, reliability, stress, and endurance testing.
Develop and maintain automated test frameworks and scripts using languages like Python or Go to increase testing efficiency and coverage.
Conduct in-depth performance analysis and bottleneck identification for storage systems (NVMe, SSD/HDD arrays, distributed storage, SAN/NAS) and server platforms (CPU, GPU, memory, PCIe, networking, and OpenBMC).
Collaborate closely with hardware design, software development, and AI/ML engineering teams to understand requirements and integrate testing throughout the product lifecycle.
Create and maintain testbeds and infrastructure for continuous integration and validation.
Communicate test progress, results, and critical issues to stakeholders.
Develop specialized test methodologies to validate performance and reliability under heavy AI/ML workloads, such as large model training and inference.
Analyze and test the interactions between GPU-accelerated computing, high-speed networking, and storage systems.
Required Qualifications
Bachelor's or Master's degree in Computer Science, Electrical Engineering, or a related technical field.
10+ years of experience in hardware and/or software testing, with at least 5 years focused on enterprise-level storage and server systems.
5+ years of experience in a lead or senior technical role, mentoring junior engineers or leading test initiatives.
Expert-level understanding and hands‑on experience with Layer 2 and Layer 3 networking protocols in large‑scale data center environments (e.g., BGP, OSPF, ISIS, MPLS).
Extensive experience with modern data center interconnect technologies (EVPN/VxLAN).
Extensive experience in hardware and/or software testing, with a strong focus on enterprise-level storage and server systems.
Deep expertise in various storage technologies, including NVMe, SAS/SATA SSDs/HDDs, RAID, distributed file systems (e.g., Ceph, Lustre), SAN, and NAS.
Strong understanding of server architectures (x86, ARM, GPU servers), CPU/memory subsystems, PCIe, power management, and Baseband Management Controllers (BMC) functionality.
Proficiency in scripting languages like Python or Bash for test automation and data analysis.
Experience with Linux operating systems (e.g., Ubuntu, CentOS, RHEL) and command‑line tools.
Knowledge of networking concepts (Ethernet, TCP/IP, InfiniBand) and network testing methodologies.
Familiarity with test methodologies such as performance testing, reliability testing, stress testing, and fault injection.
Excellent problem‑solving, analytical, and debugging skills.
Strong communication and collaboration skills to work effectively with diverse teams.
Preferred Qualifications
Experience with OCP (Open Compute Project).
Familiarity with cloud environments (AWS, Azure, GCP) and virtualization technologies.
Knowledge of containerization technologies (Docker, Kubernetes).
Experience with AI/ML frameworks (e.g., TensorFlow, PyTorch) and their infrastructure requirements.
Experience with performance profiling tools (e.g., fio, Iometer).
Industry certifications: CCIE, CompTIA Network+, NVIDIA-Certified Professional, Dell ISM.
All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran. Celestica's policy on equal employment opportunity prohibits discrimination based on race, color, creed, religion, national origin, gender, sexual orientation, gender identity, age, marital status, veteran or disability status, or other characteristics protected by law. This policy applies to hiring, promotion, discharge, pay, fringe benefits, job training, classification, referral and other aspects of employment and also states that retaliation against a person who files a charge of discrimination, participates in a discrimination proceeding, or otherwise opposes an unlawful employment practice will not be tolerated. All information will be kept confidential according to EEO guidelines.
#J-18808-Ljbffr
Oct 13, 2025
Location:
Merrimack, TX, US
Job Title:
Staff Engineer, Software
Functional Area:
Engineering (ENG)
Career Stream:
Design - Software Engineering
Job Code:
SEN-ENG-DSE
Job Band:
10
Direct/Indirect Indicator:
Indirect
Summary
Define and implement test strategies for all storage and server components, including hardware, firmware, and software.
Lead the definition and development of holistic test strategies, test plans, and test cases for complex data center network solutions, including Layer 2/3, SDN, DCN.
Design, execute, and analyze complex test cases for functional, performance, reliability, stress, and endurance testing.
Develop and maintain automated test frameworks and scripts using languages like Python or Go to increase testing efficiency and coverage.
Conduct in-depth performance analysis and bottleneck identification for storage systems (NVMe, SSD/HDD arrays, distributed storage, SAN/NAS) and server platforms (CPU, GPU, memory, PCIe, networking, and OpenBMC).
Collaborate closely with hardware design, software development, and AI/ML engineering teams to understand requirements and integrate testing throughout the product lifecycle.
Create and maintain testbeds and infrastructure for continuous integration and validation.
Communicate test progress, results, and critical issues to stakeholders.
Develop specialized test methodologies to validate performance and reliability under heavy AI/ML workloads, such as large model training and inference.
Analyze and test the interactions between GPU-accelerated computing, high-speed networking, and storage systems.
Required Qualifications
Bachelor's or Master's degree in Computer Science, Electrical Engineering, or a related technical field.
10+ years of experience in hardware and/or software testing, with at least 5 years focused on enterprise-level storage and server systems.
5+ years of experience in a lead or senior technical role, mentoring junior engineers or leading test initiatives.
Expert-level understanding and hands‑on experience with Layer 2 and Layer 3 networking protocols in large‑scale data center environments (e.g., BGP, OSPF, ISIS, MPLS).
Extensive experience with modern data center interconnect technologies (EVPN/VxLAN).
Extensive experience in hardware and/or software testing, with a strong focus on enterprise-level storage and server systems.
Deep expertise in various storage technologies, including NVMe, SAS/SATA SSDs/HDDs, RAID, distributed file systems (e.g., Ceph, Lustre), SAN, and NAS.
Strong understanding of server architectures (x86, ARM, GPU servers), CPU/memory subsystems, PCIe, power management, and Baseband Management Controllers (BMC) functionality.
Proficiency in scripting languages like Python or Bash for test automation and data analysis.
Experience with Linux operating systems (e.g., Ubuntu, CentOS, RHEL) and command‑line tools.
Knowledge of networking concepts (Ethernet, TCP/IP, InfiniBand) and network testing methodologies.
Familiarity with test methodologies such as performance testing, reliability testing, stress testing, and fault injection.
Excellent problem‑solving, analytical, and debugging skills.
Strong communication and collaboration skills to work effectively with diverse teams.
Preferred Qualifications
Experience with OCP (Open Compute Project).
Familiarity with cloud environments (AWS, Azure, GCP) and virtualization technologies.
Knowledge of containerization technologies (Docker, Kubernetes).
Experience with AI/ML frameworks (e.g., TensorFlow, PyTorch) and their infrastructure requirements.
Experience with performance profiling tools (e.g., fio, Iometer).
Industry certifications: CCIE, CompTIA Network+, NVIDIA-Certified Professional, Dell ISM.
All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran. Celestica's policy on equal employment opportunity prohibits discrimination based on race, color, creed, religion, national origin, gender, sexual orientation, gender identity, age, marital status, veteran or disability status, or other characteristics protected by law. This policy applies to hiring, promotion, discharge, pay, fringe benefits, job training, classification, referral and other aspects of employment and also states that retaliation against a person who files a charge of discrimination, participates in a discrimination proceeding, or otherwise opposes an unlawful employment practice will not be tolerated. All information will be kept confidential according to EEO guidelines.
#J-18808-Ljbffr