Logo
Yale University

Senior High Performance Computing Administrator

Yale University, New Haven, Connecticut, us, 06540

Save Job

1. Design, implement and advance core HPC systems such as the HPC provisioning system, the resource-management system, account/user lifecycle management, and user authentication and authorization systems. 2. Design, deploy, configure and support HPC clusters, including compute, networking, parallel storage and backup. 3. Install, administer and maintain hardware, system software, networking, accounts, and security measures. 4. Diagnose and correct system issues, whether these be issues with correct operation or performance. 5. Develop and maintain documentation. 6. Research developments in HPC architecture and new technologies, processes, and methodologies. 7. Determine specifications for new systems, and tailor these to meet research needs.

Required Skill/ability 1: Expertise in administration of HPC Linux clusters, including managing and configuring cluster provisioning and management tools, and batch scheduler.

Required Skill/ability 2: Experience with high-speed networking such as InfiniBand and high-speed Ethernet. Experience with large storage systems and parallel file systems such as GPFS and Lustre.

Required Skill/ability 3: Expertise in Linux system administration, including managing the operating system, networking, storage, and security. Expertise in automation and scripting in at least one scripting language.

Required Skill/ability 4: Ability to work in a team environment in a fast moving technology field. Excellent verbal and writing skills. Ability to interact well with team members and end users. Ability to work independently and across units.

Required Skill/ability 5: Attention to detail. Ability to take the care necessary to be entrusted with a system that hundreds of users depend on for research computation and the storage of research data.

Preferred Education: Experience with GPUs. Ability to specify new systems especially for AI and ML. Experience configuring, deploying, supporting large-scale systems in a research environment. Expertise in computer security in large, multi-user Linux environments. Experience with remote admin, installing and trouble-shooting hardware. Expertise securing large Linux environments.

Work Week: Standard (M-F equal number of hours per day)

Posting Position Title: Senior High Performance Computing Administrator

University Job Title: Senior High Performance Computing Administrator

Preferred Education, Experience and Skills: Experience with GPUs. Ability to specify new systems especially for AI and ML. Experience configuring, deploying, supporting large-scale systems in a research environment. Expertise in computer security in large, multi-user Linux environments. Experience with remote admin, installing and trouble-shooting hardware. Expertise securing large Linux environments. Bachelor's Degree in a related field and a minimum of six years of related work experience or an equivalent combination of education and experience.

#J-18808-Ljbffr