Fluidstack
Data Center Operations Manager - Building Lead
Fluidstack, New York, New York, United States, 10286
About Fluidstack
We build and operate high-performance GPU clusters so the most ambitious teams can move fast, stay focused, and scale without friction. Our cluster power top AI labs, governments, and enterprises. Our customers include Mistral, Poolside, Black Forest Labs, Meta, and more.
Our team is highly motivated, and focused on providing a world class supercomputing experience. We put our customers first in everything we do, working hard to not just win the sale, but to win repeated business and customer referrals.
We hold ourselves and each other to high standards. We expect you to care deeply about the work you do, the products you build, and the experience our customers have in every interaction with us.
You must work hard, take ownership from inception to delivery, and approach every problem with an open mind and a positive attitude. We value effectiveness, competence, and a growth mindset.
About the Role
We are seeking a talented Data Center Operations Manager - Building Lead to lead the management of our flagship AI infrastructure site housing hundreds of thousands of GPUs. You will play a critical role in overseeing the operations and maintenance of our data center infrastructure, ensuring maximum uptime and performance for our world-class GPU supercomputers.
Focus Lead a team of technicians and engineers to maintain 99.99% uptime for critical AI infrastructure Manage and prioritize tasks via internal tools e.g. JIRA and Confluence to ensure efficient operations Collaborate with project managers, and engineers to plan and execute data center expansions. Collaborate with internal teams to troubleshoot and perform Root Cause Analysis (RCA) and Corrective Action (CA) for issues Coordinate the creation of detailed records of hardware issues and resolutions, and communicate effectively with internal teams and vendors. Work alongside a logistics specialist to maintain an accurate spares inventory and replenish stock as needed to ensure timely repairs. Participate in new deployments by assisting Liaise with local colocation partners to fully understand site topology and articulate issues as needed Analyze routine operational tasks, create and share written reports on improvement opportunities for tooling and automation. Drive continuous improvement initiatives across data center operations
About You 7+ years of experience managing large-scale data center operations, preferably with HPC or AI infrastructure Strong technical background in data center mechanical and electrical systems (power distribution, cooling, fire suppression) Proven track record of managing teams and multi-million dollar facility budgets Experience with GPU clusters and understanding of AI workload requirements Excellent communication skills to interface with technical teams, vendors, and executive stakeholders Work in a physically challenging environment (sound/vibration/thermal) and be able to lift 50 lbs.
Nice to haves Background in managing facilities for AI/ML workloads Experience with sustainability initiatives and PUE optimization Knowledge of compliance frameworks (SOC2, ISO 27001)
Benefits Competitive total compensation package (cash + equity). Retirement or pension plan, in line with local norms. Health, dental, and vision insurance. Generous PTO policy, in line with local norms.
Fluidstack is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans' status, or any other characteristic protected by law. Fluidstack will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.
We build and operate high-performance GPU clusters so the most ambitious teams can move fast, stay focused, and scale without friction. Our cluster power top AI labs, governments, and enterprises. Our customers include Mistral, Poolside, Black Forest Labs, Meta, and more.
Our team is highly motivated, and focused on providing a world class supercomputing experience. We put our customers first in everything we do, working hard to not just win the sale, but to win repeated business and customer referrals.
We hold ourselves and each other to high standards. We expect you to care deeply about the work you do, the products you build, and the experience our customers have in every interaction with us.
You must work hard, take ownership from inception to delivery, and approach every problem with an open mind and a positive attitude. We value effectiveness, competence, and a growth mindset.
About the Role
We are seeking a talented Data Center Operations Manager - Building Lead to lead the management of our flagship AI infrastructure site housing hundreds of thousands of GPUs. You will play a critical role in overseeing the operations and maintenance of our data center infrastructure, ensuring maximum uptime and performance for our world-class GPU supercomputers.
Focus Lead a team of technicians and engineers to maintain 99.99% uptime for critical AI infrastructure Manage and prioritize tasks via internal tools e.g. JIRA and Confluence to ensure efficient operations Collaborate with project managers, and engineers to plan and execute data center expansions. Collaborate with internal teams to troubleshoot and perform Root Cause Analysis (RCA) and Corrective Action (CA) for issues Coordinate the creation of detailed records of hardware issues and resolutions, and communicate effectively with internal teams and vendors. Work alongside a logistics specialist to maintain an accurate spares inventory and replenish stock as needed to ensure timely repairs. Participate in new deployments by assisting Liaise with local colocation partners to fully understand site topology and articulate issues as needed Analyze routine operational tasks, create and share written reports on improvement opportunities for tooling and automation. Drive continuous improvement initiatives across data center operations
About You 7+ years of experience managing large-scale data center operations, preferably with HPC or AI infrastructure Strong technical background in data center mechanical and electrical systems (power distribution, cooling, fire suppression) Proven track record of managing teams and multi-million dollar facility budgets Experience with GPU clusters and understanding of AI workload requirements Excellent communication skills to interface with technical teams, vendors, and executive stakeholders Work in a physically challenging environment (sound/vibration/thermal) and be able to lift 50 lbs.
Nice to haves Background in managing facilities for AI/ML workloads Experience with sustainability initiatives and PUE optimization Knowledge of compliance frameworks (SOC2, ISO 27001)
Benefits Competitive total compensation package (cash + equity). Retirement or pension plan, in line with local norms. Health, dental, and vision insurance. Generous PTO policy, in line with local norms.
Fluidstack is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans' status, or any other characteristic protected by law. Fluidstack will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.