Logo
Energy Jobline ZR

Reliability Engineering - Datacenter RAS in Santa Clara

Energy Jobline ZR, Santa Clara, California, us, 95053

Save Job

About Celestial AI As Generative AI continues to advance, the performance drivers for data center infrastructure are shifting from systems‑on‑chip (SOCs) to systems of chips. In the era of Accelerated Computing, data center bottlenecks are no longer limited to compute performance, but rather the system's interconnect bandwidth, memory bandwidth, and memory capacity. Celestial AI's Photonic Fabric™ is the next‑interconnect technology that delivers a tenfold increase in performance and energy efficiency compared to competing solutions.

The Photonic Fabric™ is available in multiple technology offerings, including optical interface chiplets, optical interposers, and Optical Multi‑chip Interconnect Bridges (OMIB). This allows customers to easily incorporate high‑bandwidth, low‑power, and low‑latency optical interfaces into their AI accelerators and GPUs. The technology is fully compatible with both standard 2.5D packaging processes. This seamless integration enables XPUs to utilize optical interconnects for both compute‑to‑compute and compute‑to‑memory fabrics, achieving bandwidths in the tens of terabits per second with nanosecond latencies.

This innovation empowers hyperscalers to enhance the efficiency and cost‑effectiveness of AI processing by optimizing the XPUs required for training and inference, while significantly reducing the TCO2 impact. To bolster customer collaborations, Celestial AI is developing a Photonic Fabric ecosystem consisting of tier‑1 partnerships that include custom silicon/ASIC design, system integrators, HBM memory, assembly, and packaging suppliers.

About the Role We are seeking a highly motivated

Reliability Engineering

to join our

Datacenter RAS (Reliability, Availability, and Serviceability)

team, with a focus on

Silicon Photonics integration . This role is ideal for students interested in the intersection of hardware reliability, optical interconnects, and large‑scale system performance.

You will work on evaluating and improving the reliability of silicon photonics components and subsystems deployed in hyperscale data center environments, contributing to the long‑term uptime and serviceability of next‑compute and networking infrastructure.

Essential Duties and Responsibilities

Support the development and execution of

RAS strategies

for silicon photonics‑based interconnects in data center systems.

Assist in

reliability testing ,

lifetime modeling , and

failure mode analysis

of photonic components (e.g., lasers, modulators, photodetectors, optical transceivers).

Analyze field return data and lab test results to identify trends, root causes, and opportunities for design or process improvements.

Collaborate with cross‑functional teams (hardware, packaging, systems, and software) to ensure

RAS requirements

are met for photonic integration.

Contribute to the development of

monitoring and diagnostics tools

for early detection of photonic degradation or failure in deployed systems.

Help build or enhance

data pipelines

and

dashboards

for tracking reliability metrics and system health indicators.

Document findings and present recommendations to engineering and leadership teams.

Qualifications

Pursuing a

Bachelor's ,

Master's , or

Doctorate

in Electrical Engineering, Optical Engineering, Computer Engineering, or a related field.

Knowledge of

silicon photonics

and

optical communication systems .

Familiarity with

RAS principles

in large‑scale systems or data center environments is a strong plus.

Experience with

data analysis tools

(e.g., Python, MATLAB, JMP) and

database systems .

Exposure to

optical test equipment

and

reliability testing standards

(e.g., Telcordia, JEDEC) is a plus.

Strong analytical, communication, and documentation skills.

Passion for solving complex problems at the intersection of hardware reliability and system‑level performance.

What You’ll Gain

Hands‑on experience with

cutting‑edge silicon photonics technologies

in real‑world data center applications.

Exposure to

RAS methodologies

and

system‑level reliability engineering .

Mentorship from industry experts and opportunities to present your work to technical leaders.

A chance to contribute to the

future of scalable, high‑speed, and energy‑efficient data infrastructure .

Location Santa Clara, CA

Compensation This paid summer offers a competitive hourly rate of $40.00. Please note that as an intern, you will not be eligible for company‑sponsored benefits, including paid time off, health insurance, life insurance, stock options, or retirement plans.

EEO Statement Celestial AI Inc. is proud to be an equal opportunity workplace and is an affirmative action employer.

#J-18808-Ljbffr