Google
Staff Quality and Reliability Engineer, Google Cloud
Google, Sunnyvale, California, United States, 94087
Staff Quality and Reliability Engineer, Google Cloud
In this role, you’ll work to shape the future of AI/ML hardware acceleration. You will have an opportunity to drive cutting‑edge TPU (Tensor Processing Unit) technology that powers Google’s most demanding AI/ML applications. You’ll be part of a team that pushes boundaries, developing custom silicon solutions that power the future of Google’s TPU. You’ll contribute to the innovation behind products loved by millions worldwide, and leverage your design and verification expertise to verify complex digital designs, with a specific focus on TPU architecture and its integration within AI/ML‑driven systems.
As a Quality and Reliability Engineer for Google Cloud, you will lead the development of Design‑for‑Reliability guidelines and drive the adoption of advanced technologies to optimize silicon production and reliability. You will be responsible for ensuring that High Performance Computing (HPC) SOC products meet stringent quality requirements by collaborating across design, manufacturing, and hardware teams to execute comprehensive test plans. Additionally, you will own the cross‑functional investigation and root‑cause analysis of integrated circuit (IC) issues to develop effective solutions in a production environment.
The AI and Infrastructure team is redefining what’s possible. We empower Google customers with breakthrough capabilities and insights by delivering AI and Infrastructure at unparalleled scale, efficiency, reliability and velocity. Our customers include Googlers, Google Cloud customers, and billions of Google users worldwide.
Minimum qualifications
Bachelor’s degree in Electrical Engineering, Computer Engineering, Computer Science, or a related field, or equivalent practical experience.
8 years of experience in reliability or product quality engineering (e.g., working on ICs, SoCs, or microprocessors).
Experience with silicon or semiconductor manufacturing or fab processes (e.g., CMOS, FinFET, or device physics).
Experience with advanced manufacturing nodes (e.g., 5nm, 3nm) or assembly (e.g., 2.5D, 3D, or Chiplet packaging).
Experience in a production or manufacturing environment (e.g., failure analysis, root‑cause analysis, or RMA processes).
Preferred qualifications
Master’s degree or PhD in Electrical Engineering, Computer Engineering or Computer Science, with an emphasis on computer architecture.
Experience in Chiplets and high‑power devices.
Experience in data analytics to identify commonalities and abnormalities.
Experience in semiconductor reliability and manufacturing processes (fab, assembly, test), or IC and packaging failure mechanisms and related failure analysis.
Knowledge of design‑for‑reliability guidelines and implementation techniques.
Familiarity with test methods and hardware for silicon qualification (e.g., HTOL chambers, ESD, LU).
Responsibilities
Own development of Design‑for‑Reliability guidelines, collaborating with subject‑area experts (e.g., SER, EMIR, PERC, HVDRC, margining, etc.).
Facilitate technology adoption to optimize production and reliability (embedded sensors, in‑field monitor/debug, etc.).
Collaborate with design, manufacturing, silicon engineering, and hardware/component quality teams to ensure HPC SOC silicon products meet quality and reliability requirements (mission profile, DPPM/FIT, aging, etc.).
Partner with cross‑functional organizations to design and execute quality and reliability test plans (HTOL, ELFR, ESD/LU, b/HAST, THB, etc.) and production reliability methods (HVS and other methods).
Own cross‑functional investigation of IC quality and reliability issues to identify root causes and develop solutions (RMA triage, analytics, failure analysis, etc.).
Google is proud to be an equal‑opportunity workplace and is an affirmative action employer. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or veteran status. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. See also Google’s EEO Policy and EEO is the Law. If you have a disability or a special need that requires accommodation, please let us know by completing our Accommodations for Applicants form.
#J-18808-Ljbffr
As a Quality and Reliability Engineer for Google Cloud, you will lead the development of Design‑for‑Reliability guidelines and drive the adoption of advanced technologies to optimize silicon production and reliability. You will be responsible for ensuring that High Performance Computing (HPC) SOC products meet stringent quality requirements by collaborating across design, manufacturing, and hardware teams to execute comprehensive test plans. Additionally, you will own the cross‑functional investigation and root‑cause analysis of integrated circuit (IC) issues to develop effective solutions in a production environment.
The AI and Infrastructure team is redefining what’s possible. We empower Google customers with breakthrough capabilities and insights by delivering AI and Infrastructure at unparalleled scale, efficiency, reliability and velocity. Our customers include Googlers, Google Cloud customers, and billions of Google users worldwide.
Minimum qualifications
Bachelor’s degree in Electrical Engineering, Computer Engineering, Computer Science, or a related field, or equivalent practical experience.
8 years of experience in reliability or product quality engineering (e.g., working on ICs, SoCs, or microprocessors).
Experience with silicon or semiconductor manufacturing or fab processes (e.g., CMOS, FinFET, or device physics).
Experience with advanced manufacturing nodes (e.g., 5nm, 3nm) or assembly (e.g., 2.5D, 3D, or Chiplet packaging).
Experience in a production or manufacturing environment (e.g., failure analysis, root‑cause analysis, or RMA processes).
Preferred qualifications
Master’s degree or PhD in Electrical Engineering, Computer Engineering or Computer Science, with an emphasis on computer architecture.
Experience in Chiplets and high‑power devices.
Experience in data analytics to identify commonalities and abnormalities.
Experience in semiconductor reliability and manufacturing processes (fab, assembly, test), or IC and packaging failure mechanisms and related failure analysis.
Knowledge of design‑for‑reliability guidelines and implementation techniques.
Familiarity with test methods and hardware for silicon qualification (e.g., HTOL chambers, ESD, LU).
Responsibilities
Own development of Design‑for‑Reliability guidelines, collaborating with subject‑area experts (e.g., SER, EMIR, PERC, HVDRC, margining, etc.).
Facilitate technology adoption to optimize production and reliability (embedded sensors, in‑field monitor/debug, etc.).
Collaborate with design, manufacturing, silicon engineering, and hardware/component quality teams to ensure HPC SOC silicon products meet quality and reliability requirements (mission profile, DPPM/FIT, aging, etc.).
Partner with cross‑functional organizations to design and execute quality and reliability test plans (HTOL, ELFR, ESD/LU, b/HAST, THB, etc.) and production reliability methods (HVS and other methods).
Own cross‑functional investigation of IC quality and reliability issues to identify root causes and develop solutions (RMA triage, analytics, failure analysis, etc.).
Google is proud to be an equal‑opportunity workplace and is an affirmative action employer. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or veteran status. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. See also Google’s EEO Policy and EEO is the Law. If you have a disability or a special need that requires accommodation, please let us know by completing our Accommodations for Applicants form.
#J-18808-Ljbffr