Harbor Pointe Consulting, INC
Senior HPC Administrator - FULL TIME PERM ROLE/NOT CONTRACT Harbor Pointe Consulting – High Performance Computational & Data Ecosystem Harbor Pointe Consulting is seeking an experienced Senior High Performance Computing (HPC) Administrator to support a large-scale computational and data science ecosystem serving advanced research initiatives. This ecosystem includes HPC clusters, research databases, and software development infrastructure powering both local and national projects. The Senior Administrator will combine deep technical expertise with a strong commitment to customer service, ensuring researchers and collaborators have reliable access to world-class computing resources. This role requires an expert troubleshooter who thrives in dynamic environments and can drive projects to completion with minimal supervision. The position reports to the Director of Computational & Data Ecosystem Services. Key Responsibilities Design, deploy, and maintain a large-scale computational and data science ecosystem including ~30,000 cores with high-bandwidth, low-latency interconnects, GPUs, large shared-memory nodes, databases, research workflows, and more than 30 petabytes of production storage. Lead troubleshooting and resolution for technical issues across applications, systems, hardware, software, and networking, while actively monitoring system health. Manage and optimize computational, data, cloud, and workflow technologies for researchers and external collaborators, defining and implementing a forward-looking vision for computational resources. Perform full-spectrum system administration, including hardware/software configuration, configuration management, monitoring (with regression testing), usage reporting, performance tuning (file systems, schedulers, interconnects, availability), security, and networking. Collaborate cross-functionally with IT, compliance, security, and regulatory stakeholders to ensure best practices and adherence to relevant standards. Integrate HPC resources with laboratory and research technologies (e.g., sequencers, clinical/research data pipelines), ensuring seamless data and compute connectivity. Deploy and optimize resource management and scheduling software, as well as tune and upgrade parallel file systems and data-oriented resources. Implement and manage robust security infrastructure, including policies, procedures, and monitoring. Develop and enforce backup and disaster recovery strategies in line with industry best practices. Contribute to financial sustainability through budgeting, cost analysis, and input on chargeback/recovery models. Support research initiatives by assisting with system design contributions for grant proposals and producing clear technical documentation. Provide after-hours support for critical issues and maintain ticket response workflows. Actively engage as a collaborative team member across Harbor Pointe Consulting projects and client partnerships. Qualifications Bachelor’s degree in computer science, engineering, or a related field (Master’s or PhD preferred). 8 years of progressive HPC system administration and operations experience (preferably with Red Hat/CentOS Linux in a batch HPC cluster environment). Expert troubleshooting skills with a strong customer-service orientation. Experience with job schedulers such as LSF or Slurm, and with parallel file systems and large-scale storage. Networking and security expertise, including Infiniband and Gigabit Ethernet. Familiarity with configuration management systems (xCAT, Puppet, Ansible). Experience with databases, web services, and cloud computing environments. Scripting and programming proficiency. Ability to multitask effectively in a dynamic research or enterprise environment. Strong written, oral, and interpersonal communication skills, with the ability to work as a liaison between research and technology teams. Preferred Experience Advanced degree in a relevant discipline. Hands-on experience with GPFS, LSF, TSM, IB, and enterprise Ethernet networking. Extensive experience with databases and web services.aa415a4b-8b21-40fc-a65c-70d2b25ca29a