Larsen & Toubro
AI Data Center Operations Manager
We are seeking an experienced AI Data Center Operations Manager to lead operations for a cutting-edge AI data center. This role involves managing all critical infrastructure systems—including power, mechanical, electrical, cooling, HVAC, liquid cooling, network, telco, chillers, water treatment plant, and generators—while ensuring optimal performance of NVIDIA B300 GPU clusters on Dell PowerEdge hardware. The position requires strong collaboration with vendors and internal teams to maintain high availability, security, and compliance.
Key Responsibilities Infrastructure Operations
Oversee daily operations of power systems, mechanical/electrical infrastructure, HVAC, liquid cooling, chillers, water treatment plant, and backup generators.
Ensure continuous uptime and operational efficiency for AI compute clusters.
AI Hardware & Compute Management
Coordinate hardware upgrades and troubleshooting with engineering teams.
Act as primary liaison with vendors including Servers, HVAC providers, electrical contractors, and other critical service partners.
Negotiate and manage service agreements, maintenance schedules, and procurement.
Network & Telco
Maintain robust connectivity and manage network infrastructure supporting AI workloads.
Work with telecom providers to ensure redundancy and high availability.
Physical Security & Compliance
Partner with the physical security team to enforce access control, surveillance, and compliance with security protocols.
Ensure adherence to industry standards and environmental regulations.
Implement advanced monitoring systems for power, cooling, and compute resources.
Lead incident response and root cause analysis for outages or failures.
Qualifications
Bachelor’s degree in electrical engineering, Mechanical Engineering, Computer Science, or related field (master’s preferred).
7+ years of experience in data center operations, with 3+ years managing AI or HPC environments.
Expertise in power systems, HVAC, liquid cooling, and mechanical/electrical infrastructure.
Hands‑on experience with GPU clusters (NVIDIA B300 preferred) and Dell PowerEdge hardware.
Strong knowledge of networking, telco systems, and high‑availability architectures.
Excellent leadership, communication, and vendor management skills.
Preferred Skills
Familiarity with AI workload orchestration tools (Kubernetes, Slurm).
Certifications: CDCP/CDCS, ITIL, or Data Center Management.
Experience with environmental sustainability practices in data centers.
Benefits
Medical insurance
Vision insurance
401(k)
Seniority level Mid-Senior level
Employment type Full-time
Job function Other
#J-18808-Ljbffr
Key Responsibilities Infrastructure Operations
Oversee daily operations of power systems, mechanical/electrical infrastructure, HVAC, liquid cooling, chillers, water treatment plant, and backup generators.
Ensure continuous uptime and operational efficiency for AI compute clusters.
AI Hardware & Compute Management
Coordinate hardware upgrades and troubleshooting with engineering teams.
Act as primary liaison with vendors including Servers, HVAC providers, electrical contractors, and other critical service partners.
Negotiate and manage service agreements, maintenance schedules, and procurement.
Network & Telco
Maintain robust connectivity and manage network infrastructure supporting AI workloads.
Work with telecom providers to ensure redundancy and high availability.
Physical Security & Compliance
Partner with the physical security team to enforce access control, surveillance, and compliance with security protocols.
Ensure adherence to industry standards and environmental regulations.
Implement advanced monitoring systems for power, cooling, and compute resources.
Lead incident response and root cause analysis for outages or failures.
Qualifications
Bachelor’s degree in electrical engineering, Mechanical Engineering, Computer Science, or related field (master’s preferred).
7+ years of experience in data center operations, with 3+ years managing AI or HPC environments.
Expertise in power systems, HVAC, liquid cooling, and mechanical/electrical infrastructure.
Hands‑on experience with GPU clusters (NVIDIA B300 preferred) and Dell PowerEdge hardware.
Strong knowledge of networking, telco systems, and high‑availability architectures.
Excellent leadership, communication, and vendor management skills.
Preferred Skills
Familiarity with AI workload orchestration tools (Kubernetes, Slurm).
Certifications: CDCP/CDCS, ITIL, or Data Center Management.
Experience with environmental sustainability practices in data centers.
Benefits
Medical insurance
Vision insurance
401(k)
Seniority level Mid-Senior level
Employment type Full-time
Job function Other
#J-18808-Ljbffr