Jobs via Dice

Senior HPC Linux System Administrator

Jobs via Dice, Atlanta, Georgia, United States, 30383

5 days ago Be among the first 25 applicants Overview

The Public Health and Human Services Operation of Leidos is seeking a

Senior HPC Linux System Administrator

to lead a team of system administrator professionals in managing a high-performance computing (HPC) infrastructure used by public health researchers and scientists. This senior-level position requires extensive Linux expertise combined with a deep understanding of the specialized hardware, software, and networking required for scientific research and large-scale data analysis. Candidate MUST: located in the Atlanta, GA area for partial onsite work; be able to obtain a Public Trust Clearance. The candidate provides secure and always-on infrastructure services accessed by researchers to customer-sponsored data hosted in an on-premise infrastructure and the cloud, and secure access to the HPC resources for scientific research. Responsibilities

High-performance Computing infrastructure management: Deploy, administer, and monitor HPC clusters. Manage multi-petabytes of data using Pure Storage flash memory storage and AWS S3 Glacier. Software and resource management: Install, maintain, and upgrade scientific software, libraries, and batch schedulers such as GridEngine and Slurm. Develop effective processes and solutions for sharing resources across multiple research teams. VMware: Manage the VMware vSphere Foundation for virtual server provisioning, deployment, and configuration, as well as hardware and software implementation and maintenance. System Operations: System monitoring, routine and on-demand security patch management, troubleshooting, and performance tuning. Project planning and coordination: Advise customers and Project Manager in designing and documenting technical solutions. Support infrastructure projects from planning to execution, providing status updates. Communicate with internal and client teams to provide technical counsel and alternative designs or processes to leadership. Automation and scripting: Lead automation efforts to streamline system management tasks using Bash, Python, and configuration management tools (Puppet, Ansible). Research collaboration: Work with scientists, bioinformatics developers, and principal investigators to understand computational needs and translate scientific goals into technical configurations, including providing technical support to optimize workflows. System architecture and deployment: Lead the design, integration, and optimization of on-site HPC and cloud resources. Mentorship and team coordination: Guide and mentor other system administrators on best practices for system administration and troubleshooting; some roles involve managing a team. Security and compliance: Implement robust security measures, manage access controls, and design architectures that meet compliance standards such as HIPAA or NIST; support SA&A processes. Disaster recovery and monitoring: Design and implement backup and disaster recovery plans; integrate monitoring and alerting systems to ensure system availability and reliability. Qualifications

A Bachelor's degree in computer science or a related field, plus 10 years of System Administration experience. Requires extensive experience (7+ years) in designing and operating HPC infrastructure. Linux expertise: Mastery of Linux systems and administration, including troubleshooting, security, performance monitoring, and various distributions (e.g., Red Hat, Ubuntu) to support scientific computing. Soft skills: Strong problem-solving and communication skills; ability to collaborate with customers, bioinformatics developers, researchers; experience leading a team and integrating new technologies into production environments. Network: Proficiency with routers, switches, gateways, and hubs. Security: Develop infrastructure deliverables, continuous diagnostics and mitigation, threat mitigation, incident response, security architecture support, patch management, vulnerability management, risk management, information assurance, and SA&A documentation. VMware: Experience managing VM infrastructure. Leadership: Proven ability to plan and coordinate infrastructure support activities and mentor system administrators. HPC and cluster management: Experience with HPC clusters, job schedulers (Slurm), and high-speed networking (10/40/100 Gb). Other technical skills: Proficiency in Bash and Python scripting for automation; experience with cloud technologies (hybrid-cloud) and container environments (Docker, Singularity, Kubernetes). Desired Qualifications

A Master's Degree in IT, engineering, or related fields. Experience with federal government agencies or research organizations. Large-scale infrastructure design and implementation project experience. RHCE, RHCA, or equivalent certifications. Networking knowledge including TCP/IP/UDP/HTTP/DHCP/DNS; understanding of LAN, WAN, and VPN design and management. Experience optimizing cloud utilization patterns and migrations from on-premises to hybrid models. AWS or Azure cloud engineer certification. Pay Range: $89,700.00 - $162,150.00. The Leidos pay range for this job level is a general guideline and not a guarantee of compensation. Additional factors considered include responsibilities, education, experience, knowledge, skills, abilities, internal equity, market data, and applicable agreements. Original Posting: September 29, 2025

#J-18808-Ljbffr