Energy Jobline ZR
Reliability Engineering - Datacenter RAS in Santa Clara
Energy Jobline ZR, Santa Clara, California, us, 95053
About Celestial AI
As Generative AI continues to advance, the performance drivers for data center infrastructure are shifting from systems‑on‑chip (SOCs) to systems of chips. In the era of Accelerated Computing, data center bottlenecks are no longer limited to compute performance, but rather the system's interconnect bandwidth, memory bandwidth, and memory capacity. Celestial AI's Photonic Fabric™ is the next‑interconnect technology that delivers a tenfold increase in performance and energy efficiency compared to competing solutions.
The Photonic Fabric™ is available in multiple technology offerings, including optical interface chiplets, optical interposers, and Optical Multi‑chip Interconnect Bridges (OMIB). This allows customers to easily incorporate high‑bandwidth, low‑power, and low‑latency optical interfaces into their AI accelerators and GPUs. The technology is fully compatible with both standard 2.5D packaging processes. This seamless integration enables XPUs to utilize optical interconnects for both compute‑to‑compute and compute‑to‑memory fabrics, achieving bandwidths in the tens of terabits per second with nanosecond latencies.
This innovation empowers hyperscalers to enhance the efficiency and cost‑effectiveness of AI processing by optimizing the XPUs required for training and inference, while significantly reducing the TCO2 impact. To bolster customer collaborations, Celestial AI is developing a Photonic Fabric ecosystem consisting of tier‑1 partnerships that include custom silicon/ASIC design, system integrators, HBM memory, assembly, and packaging suppliers.
About the Role We are seeking a highly motivated
Reliability Engineering
to join our
Datacenter RAS (Reliability, Availability, and Serviceability)
team, with a focus on
Silicon Photonics integration . This role is ideal for students interested in the intersection of hardware reliability, optical interconnects, and large‑scale system performance.
You will work on evaluating and improving the reliability of silicon photonics components and subsystems deployed in hyperscale data center environments, contributing to the long‑term uptime and serviceability of next‑compute and networking infrastructure.
Essential Duties and Responsibilities
Support the development and execution of
RAS strategies
for silicon photonics‑based interconnects in data center systems.
Assist in
reliability testing ,
lifetime modeling , and
failure mode analysis
of photonic components (e.g., lasers, modulators, photodetectors, optical transceivers).
Analyze field return data and lab test results to identify trends, root causes, and opportunities for design or process improvements.
Collaborate with cross‑functional teams (hardware, packaging, systems, and software) to ensure
RAS requirements
are met for photonic integration.
Contribute to the development of
monitoring and diagnostics tools
for early detection of photonic degradation or failure in deployed systems.
Help build or enhance
data pipelines
and
dashboards
for tracking reliability metrics and system health indicators.
Document findings and present recommendations to engineering and leadership teams.
Qualifications
Pursuing a
Bachelor's ,
Master's , or
Doctorate
in Electrical Engineering, Optical Engineering, Computer Engineering, or a related field.
Knowledge of
silicon photonics
and
optical communication systems .
Familiarity with
RAS principles
in large‑scale systems or data center environments is a strong plus.
Experience with
data analysis tools
(e.g., Python, MATLAB, JMP) and
database systems .
Exposure to
optical test equipment
and
reliability testing standards
(e.g., Telcordia, JEDEC) is a plus.
Strong analytical, communication, and documentation skills.
Passion for solving complex problems at the intersection of hardware reliability and system‑level performance.
What You’ll Gain
Hands‑on experience with
cutting‑edge silicon photonics technologies
in real‑world data center applications.
Exposure to
RAS methodologies
and
system‑level reliability engineering .
Mentorship from industry experts and opportunities to present your work to technical leaders.
A chance to contribute to the
future of scalable, high‑speed, and energy‑efficient data infrastructure .
Location Santa Clara, CA
Compensation This paid summer offers a competitive hourly rate of $40.00. Please note that as an intern, you will not be eligible for company‑sponsored benefits, including paid time off, health insurance, life insurance, stock options, or retirement plans.
EEO Statement Celestial AI Inc. is proud to be an equal opportunity workplace and is an affirmative action employer.
#J-18808-Ljbffr
The Photonic Fabric™ is available in multiple technology offerings, including optical interface chiplets, optical interposers, and Optical Multi‑chip Interconnect Bridges (OMIB). This allows customers to easily incorporate high‑bandwidth, low‑power, and low‑latency optical interfaces into their AI accelerators and GPUs. The technology is fully compatible with both standard 2.5D packaging processes. This seamless integration enables XPUs to utilize optical interconnects for both compute‑to‑compute and compute‑to‑memory fabrics, achieving bandwidths in the tens of terabits per second with nanosecond latencies.
This innovation empowers hyperscalers to enhance the efficiency and cost‑effectiveness of AI processing by optimizing the XPUs required for training and inference, while significantly reducing the TCO2 impact. To bolster customer collaborations, Celestial AI is developing a Photonic Fabric ecosystem consisting of tier‑1 partnerships that include custom silicon/ASIC design, system integrators, HBM memory, assembly, and packaging suppliers.
About the Role We are seeking a highly motivated
Reliability Engineering
to join our
Datacenter RAS (Reliability, Availability, and Serviceability)
team, with a focus on
Silicon Photonics integration . This role is ideal for students interested in the intersection of hardware reliability, optical interconnects, and large‑scale system performance.
You will work on evaluating and improving the reliability of silicon photonics components and subsystems deployed in hyperscale data center environments, contributing to the long‑term uptime and serviceability of next‑compute and networking infrastructure.
Essential Duties and Responsibilities
Support the development and execution of
RAS strategies
for silicon photonics‑based interconnects in data center systems.
Assist in
reliability testing ,
lifetime modeling , and
failure mode analysis
of photonic components (e.g., lasers, modulators, photodetectors, optical transceivers).
Analyze field return data and lab test results to identify trends, root causes, and opportunities for design or process improvements.
Collaborate with cross‑functional teams (hardware, packaging, systems, and software) to ensure
RAS requirements
are met for photonic integration.
Contribute to the development of
monitoring and diagnostics tools
for early detection of photonic degradation or failure in deployed systems.
Help build or enhance
data pipelines
and
dashboards
for tracking reliability metrics and system health indicators.
Document findings and present recommendations to engineering and leadership teams.
Qualifications
Pursuing a
Bachelor's ,
Master's , or
Doctorate
in Electrical Engineering, Optical Engineering, Computer Engineering, or a related field.
Knowledge of
silicon photonics
and
optical communication systems .
Familiarity with
RAS principles
in large‑scale systems or data center environments is a strong plus.
Experience with
data analysis tools
(e.g., Python, MATLAB, JMP) and
database systems .
Exposure to
optical test equipment
and
reliability testing standards
(e.g., Telcordia, JEDEC) is a plus.
Strong analytical, communication, and documentation skills.
Passion for solving complex problems at the intersection of hardware reliability and system‑level performance.
What You’ll Gain
Hands‑on experience with
cutting‑edge silicon photonics technologies
in real‑world data center applications.
Exposure to
RAS methodologies
and
system‑level reliability engineering .
Mentorship from industry experts and opportunities to present your work to technical leaders.
A chance to contribute to the
future of scalable, high‑speed, and energy‑efficient data infrastructure .
Location Santa Clara, CA
Compensation This paid summer offers a competitive hourly rate of $40.00. Please note that as an intern, you will not be eligible for company‑sponsored benefits, including paid time off, health insurance, life insurance, stock options, or retirement plans.
EEO Statement Celestial AI Inc. is proud to be an equal opportunity workplace and is an affirmative action employer.
#J-18808-Ljbffr