Logo
Macpower Digital Assets Edge

Senior HPC Specialist

Macpower Digital Assets Edge, Denver, Colorado, United States, 80285

Save Job

Job Summary:

We are seeking a

highly skilled and experienced Senior HPC Specialist

to design, implement, and maintain high-performance computing (HPC) systems and solutions. The ideal candidate will play a critical role in optimizing computational performance, ensuring the reliability of the infrastructure, and supporting advanced computational workloads in a dynamic and innovative environment. Key Responsibilities: HPC System Design & Implementation:

Design and deploy HPC clusters, including compute, storage, and networking components. Evaluate and integrate new HPC technologies to enhance system performance and scalability. System Administration & Maintenance:

Manage Linux-based HPC systems using job schedulers (e.g., Slurm, PBS, Grid Engine). Monitor system health, troubleshoot issues, and resolve performance bottlenecks. Ensure optimal configuration and high availability of HPC resources. Performance Optimization:

Profile and fine-tune applications and workloads for peak performance on HPC systems. nalyze job performance and provide recommendations to users for enhancements. Storage & Data Management:

dminister large-scale parallel file systems (e.g., Lustre, GPFS, BeeGFS). Implement data transfer and storage strategies for high-throughput workloads. User Support & Collaboration:

Provide technical support and training to researchers and end users. Work with interdisciplinary teams to understand and meet computational requirements. Security & Compliance:

dhere to security best practices and compliance standards for HPC systems. Develop and manage backup and disaster recovery solutions. Required Qualifications:

HPC Expertise:

Proven experience in HPC cluster design, deployment, and management (compute, storage, networking). Linux Administration:

Proficiency in administering Linux systems (RedHat, CentOS, Ubuntu). Job Scheduling:

Hands-on experience with job schedulers like Slurm, PBS, or Grid Engine. Performance Tuning:

Strong skills in profiling, benchmarking, and optimization techniques. Parallel File Systems:

Expertise in managing Lustre, GPFS, or BeeGFS. Scripting & Automation:

Advanced knowledge of scripting languages (Bash, Python, Perl). User Support:

Ability to provide technical documentation, training, and collaboration. Security:

Knowledge of system hardening, backups, and disaster recovery processes. Preferred Qualifications:

Collaboration Skills:

Experience working in interdisciplinary teams to meet diverse computational needs. Innovation:

Passion for exploring emerging HPC technologies to enhance capabilities.