Jobs via Dice
3 days ago Be among the first 25 applicants
Get AI-powered advice on this job and more exclusive features.
Dice is the leading career destination for tech experts at every stage of their careers. Our client, Motion Recruitment Partners, LLC, is seeking the following. Apply via Dice today!
Senior HPC Systems Engineer
A research computing organization that serves as the primary provider of high-performance computing (HPC), storage, and visualization resources for a large academic research community is looking to bring on a Senior HPC Systems Engineer. The team supports thousands of users across hundreds of research groups, enabling advanced research through centrally managed HPC infrastructure, scientific software, and technical expertise. Join a small, highly collaborative systems and operations team responsible for building, operating, and scaling large‑scale HPC environments. This role plays a critical part in maintaining and evolving complex on‑prem and hybrid cloud infrastructure that supports diverse research workloads across multiple scientific disciplines. This is a hybrid position requiring 3 days onsite to support hands‑on data center and infrastructure operations. Required Skills & Experience
5‑7+ years of experience in systems administration, HPC systems engineering, or related roles Strong Linux systems administration experience (Red Hat‑based environments) Experience installing, configuring, and maintaining large compute clusters and servers Experience with HPC schedulers such as Slurm Experience managing high‑performance or parallel file systems (e.g., GPFS or similar) Scripting experience with Bash and Python Experience using automation and configuration management tools (e.g., Ansible) Experience troubleshooting hardware, OS, storage, and networking issues in production environments Familiarity with hybrid infrastructure environments (on‑prem with AWS and/or Google Cloud Platform) Bachelor's degree in a related technical field or equivalent practical experience Desired Skills & Experience
Experience in academic, research, or national lab HPC environments Experience with HPC hardware procurement and lifecycle management Familiarity with InfiniBand networking and HPC authentication mechanisms Experience with infrastructure‑as‑code tools (e.g., Terraform) Experience supporting HIPAA‑compliant or regulated systems Exposure to AI/ML, scientific computing, or data‑intensive research workloads Experience supporting heterogeneous hardware environments Strong documentation and cross‑team collaboration skills What You Will Be Doing
Tech Breakdown
60% Linux Systems & HPC Infrastructure (Red Hat, cluster administration) 20% Storage, Networking & Performance (GPFS, InfiniBand, monitoring) 15% Automation & Scripting (Bash, Python, Ansible, Terraform) 5% Cloud & Hybrid Integration (AWS, Google Cloud Platform) Daily Responsibilities
40% Hands‑on systems administration of HPC clusters, servers, and operating systems 20% Monitoring, performance tuning, and troubleshooting of compute, storage, and network components 15% Automation, scripting, patching, and security maintenance 10% User support, access management, and help desk ticket resolution 10% Software deployment, upgrades, backups, and restores 5% Documentation, compliance tracking, and inventory reporting The Offer
Medical, Dental, and Vision Insurance Vacation Time Applicants must be currently authorized to work in the US on a full‑time basis now and in the future.
#J-18808-Ljbffr
A research computing organization that serves as the primary provider of high-performance computing (HPC), storage, and visualization resources for a large academic research community is looking to bring on a Senior HPC Systems Engineer. The team supports thousands of users across hundreds of research groups, enabling advanced research through centrally managed HPC infrastructure, scientific software, and technical expertise. Join a small, highly collaborative systems and operations team responsible for building, operating, and scaling large‑scale HPC environments. This role plays a critical part in maintaining and evolving complex on‑prem and hybrid cloud infrastructure that supports diverse research workloads across multiple scientific disciplines. This is a hybrid position requiring 3 days onsite to support hands‑on data center and infrastructure operations. Required Skills & Experience
5‑7+ years of experience in systems administration, HPC systems engineering, or related roles Strong Linux systems administration experience (Red Hat‑based environments) Experience installing, configuring, and maintaining large compute clusters and servers Experience with HPC schedulers such as Slurm Experience managing high‑performance or parallel file systems (e.g., GPFS or similar) Scripting experience with Bash and Python Experience using automation and configuration management tools (e.g., Ansible) Experience troubleshooting hardware, OS, storage, and networking issues in production environments Familiarity with hybrid infrastructure environments (on‑prem with AWS and/or Google Cloud Platform) Bachelor's degree in a related technical field or equivalent practical experience Desired Skills & Experience
Experience in academic, research, or national lab HPC environments Experience with HPC hardware procurement and lifecycle management Familiarity with InfiniBand networking and HPC authentication mechanisms Experience with infrastructure‑as‑code tools (e.g., Terraform) Experience supporting HIPAA‑compliant or regulated systems Exposure to AI/ML, scientific computing, or data‑intensive research workloads Experience supporting heterogeneous hardware environments Strong documentation and cross‑team collaboration skills What You Will Be Doing
Tech Breakdown
60% Linux Systems & HPC Infrastructure (Red Hat, cluster administration) 20% Storage, Networking & Performance (GPFS, InfiniBand, monitoring) 15% Automation & Scripting (Bash, Python, Ansible, Terraform) 5% Cloud & Hybrid Integration (AWS, Google Cloud Platform) Daily Responsibilities
40% Hands‑on systems administration of HPC clusters, servers, and operating systems 20% Monitoring, performance tuning, and troubleshooting of compute, storage, and network components 15% Automation, scripting, patching, and security maintenance 10% User support, access management, and help desk ticket resolution 10% Software deployment, upgrades, backups, and restores 5% Documentation, compliance tracking, and inventory reporting The Offer
Medical, Dental, and Vision Insurance Vacation Time Applicants must be currently authorized to work in the US on a full‑time basis now and in the future.
#J-18808-Ljbffr