Microsoft
Microsoft Silicon, Cloud Hardware, and Infrastructure Engineering (SCHIE) is the team behind Microsoft’s expanding Cloud Infrastructure and responsible for powering Microsoft’s “Intelligent Cloud” mission. SCHIE delivers the core infrastructure and foundational technologies for Microsoft's over 200 online businesses including Bing, MSN, Office 365, Xbox Live, Teams, OneDrive, and the Microsoft Azure platform globally with server and data center infrastructure, security and compliance, operations, globalization, and manageability solutions. Our focus is on smart growth, high efficiency, and delivering a trusted experience to customers and partners worldwide. We are looking for passionate, high-energy engineers to help achieve that mission.
As Microsoft’s cloud business continues to grow, the ability to deploy new offerings and hardware infrastructure on time, in high volume with high quality and lowest cost is paramount. The Hardware, Infrastructure Management, and Fundamentals Engineering (HIFE) team defines and delivers operational measures of success for hardware manufacturing, improving the planning process, quality, delivery, scale and sustainability related to Microsoft cloud hardware. We are looking for seasoned engineers with a dedicated passion for customer‑focused solutions, insight, and industry knowledge to envision and implement future technical solutions that will manage and optimize the Cloud infrastructure.
We are looking for a
Principal Hardware Quality Engineer
to join the team. Responsibilities
Hands‑on debugging in data centers (onsite and virtual) Develop and implement a robust supplier quality management strategy to ensure data center hardware is manufactured to the highest quality standards Lead cross‑functional resolution of critical & high‑severity issues across data centers, development, and suppliers Conduct hands‑on debugging in global data centers including GPU subsystem failure analysis Drive continuous improvement based on root cause analysis (RCA) and identified opportunities Manage multiple NPI builds and quality phase‑gate deliverables for the manufacturing team throughout the engineering development lifecycle, from concept through production readiness Establish critical‑to‑quality performance metrics to measure and improve product quality Advocate as voice of quality in hardware change management process, ensuring quality requirements are considered and met Required Qualifications
Doctorate Degree in Mechanical Engineering, Materials Engineering, Reliability Engineering, Electrical Engineering, or related field AND 5+ years of technical engineering experience OR Master’s Degree in Mechanical Engineering, Materials Engineering, Reliability Engineering, Electrical Engineering, or related field AND 7+ years of technical engineering experience OR Bachelor’s Degree in Mechanical Engineering, Materials Engineering, Reliability Engineering, Electrical Engineering, or related field AND 8+ years of technical engineering experience 8+ years of work experience in managing product quality in the electronic industry 5+ years of direct engineering experience in hardware system issue resolution for GPU Servers 3+ years of experience with query languages like SQL for debugging data, telemetry, and logs to identify and investigate hardware failure signatures Other Requirements
Ability to meet Microsoft security screening requirements. This role requires passing the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter. Preferred Qualifications
Master’s degree in Electrical Engineering, Software Engineering, or System Engineering (or 12+ years of equivalent experience) Patent or track record of engineering excellence Experience with liquid cooling systems in data centers 12+ years of experience working with modern server architectures, including understanding of GPU, GPU system hardware, memory or CPU, and methods for failure analysis, debugging, or validation 12+ years of proven success leading resolution of critical quality issues across data centers 8+ years of system‑level server debugging with understanding of power, system, and network environments 3+ years of direct GPU‑related engineering experience in issue debug/test log review Leadership skills and ability to collaborate with diverse teams and drive a call to action Reliability Engineering IC5 – The typical base pay range for this role across the U.S. is USD $139,900 – $274,800 per year. For San Francisco Bay area and New York City metropolitan area, the base pay range is USD $188,000 – $304,200 per year. Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: US corporate pay information | Microsoft Careers. Microsoft will accept applications for the role until Oct 29th, 2025. Microsoft is an equal opportunity employer. Consistent with applicable law, all qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If applicants need assistance or a reasonable accommodation due to a disability during the application process, read about accommodations.
#J-18808-Ljbffr
Principal Hardware Quality Engineer
to join the team. Responsibilities
Hands‑on debugging in data centers (onsite and virtual) Develop and implement a robust supplier quality management strategy to ensure data center hardware is manufactured to the highest quality standards Lead cross‑functional resolution of critical & high‑severity issues across data centers, development, and suppliers Conduct hands‑on debugging in global data centers including GPU subsystem failure analysis Drive continuous improvement based on root cause analysis (RCA) and identified opportunities Manage multiple NPI builds and quality phase‑gate deliverables for the manufacturing team throughout the engineering development lifecycle, from concept through production readiness Establish critical‑to‑quality performance metrics to measure and improve product quality Advocate as voice of quality in hardware change management process, ensuring quality requirements are considered and met Required Qualifications
Doctorate Degree in Mechanical Engineering, Materials Engineering, Reliability Engineering, Electrical Engineering, or related field AND 5+ years of technical engineering experience OR Master’s Degree in Mechanical Engineering, Materials Engineering, Reliability Engineering, Electrical Engineering, or related field AND 7+ years of technical engineering experience OR Bachelor’s Degree in Mechanical Engineering, Materials Engineering, Reliability Engineering, Electrical Engineering, or related field AND 8+ years of technical engineering experience 8+ years of work experience in managing product quality in the electronic industry 5+ years of direct engineering experience in hardware system issue resolution for GPU Servers 3+ years of experience with query languages like SQL for debugging data, telemetry, and logs to identify and investigate hardware failure signatures Other Requirements
Ability to meet Microsoft security screening requirements. This role requires passing the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter. Preferred Qualifications
Master’s degree in Electrical Engineering, Software Engineering, or System Engineering (or 12+ years of equivalent experience) Patent or track record of engineering excellence Experience with liquid cooling systems in data centers 12+ years of experience working with modern server architectures, including understanding of GPU, GPU system hardware, memory or CPU, and methods for failure analysis, debugging, or validation 12+ years of proven success leading resolution of critical quality issues across data centers 8+ years of system‑level server debugging with understanding of power, system, and network environments 3+ years of direct GPU‑related engineering experience in issue debug/test log review Leadership skills and ability to collaborate with diverse teams and drive a call to action Reliability Engineering IC5 – The typical base pay range for this role across the U.S. is USD $139,900 – $274,800 per year. For San Francisco Bay area and New York City metropolitan area, the base pay range is USD $188,000 – $304,200 per year. Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: US corporate pay information | Microsoft Careers. Microsoft will accept applications for the role until Oct 29th, 2025. Microsoft is an equal opportunity employer. Consistent with applicable law, all qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If applicants need assistance or a reasonable accommodation due to a disability during the application process, read about accommodations.
#J-18808-Ljbffr