Logo
Nextonic Solutions LLC

High-Performance Computing Systems Engineer

Nextonic Solutions LLC, Rockville, Maryland, us, 20849

Save Job

Overview Nextonic Solutions

is seeking a

High-Performance Computing (HPC) Systems Engineer

to join our vibrant team at the National Institutes of Health (NIH) supporting the

The National Center for Advancing Translational Sciences (NCATS)

located in

Rockville, MD.

The

High-Performance Computing (HPC) Systems Engineer

will support the

Scientific Computing and Informatics (SCI)

team at

The National Center for Advancing Translational Sciences (NCATS).

This role will focus on the design, optimization, security, and maintenance of HPC and cloud-based infrastructures that enable cutting-edge biomedical research through scalable, secure, and high-performing computing environments.

Responsibilities

Design, configure, and maintain scalable HPC clusters for optimal performance.

Support documentation and ATO (Authority to Operate) processes.

Ensure infrastructure design compliance with federal security standards and best practices.

Implement monitoring tools such as XDMoD for transparency and user reporting.

Integrate platforms such as JupyterHub and job schedulers (e.g., Slurm) for improved interactivity.

Develop and manage AWS-based infrastructure using Terraform, Packer, and Ansible.

Automate deployment workflows to streamline provisioning, updates, and scaling.

Manage systems involved in AWS Secure Cloud Bridging (SCB) and STRIDES initiatives.

Implement CIS benchmark-aligned system hardening using OpenSCAP.

Administer optimized compute images (CPU/GPU) for scientific workflows.

Leverage tools such as OpenHPC, Warewulf, and Ansible for environment management.

Lead and coordinate quarterly patch cycles.

Partner with researchers and external stakeholders on critical projects.

Facilitate solution transitions to other NIH centers and collaborators.

Contribute to publications and team objectives through deep technical engagement.

Qualifications

Federal ATO processes experience required

HPC architecture and performance optimization is required

Scientific software development and deployment

High-speed network and parallel file system architecture

Troubleshooting, diagnostics, and technical support

Strong communication and multitasking skills

Programming & Scripting

Languages - Pascal, BASIC, Delphi, Visual Basic, C, C++

Scripting - Bash, Perl, Python, Ruby, PEAR, Tcl

Systems & Network Administration

Linux – RHEL/CentOS, SUSE, Debian, Ubuntu

Windows – 95–10; NT–Server 2016

Networking – Active Directory, TCP/IP v4/v6, DHCP, DNS, WINS

Legacy – NOVELL 3.1–5, VPN, Citrix, Terminal Services

Monitoring & Management Tools

Nagios, Ganglia, HP BAC, Precise i3

SGI SMC, HP PCM, Bright Cluster Manager (incl. Data Analytics)

Infrastructure & Automation

Puppet, Cobbler, Ansible, Chef

Red Hat Satellite, Kickstart, RPM optimization

File Systems & Archiving

Panasas (DirectFlow/panfs), DDN (GPFS), SGI DMF, StorHouse/RFS (Filetek)

HPC Tools & Job Scheduling

MOAB/MAUI, Torque, PBS Pro, Windows HPC Scheduler

Visualization & Remote Access

Nice DCV, EnginFrame, VNC, OpenText Exceed OnDemand, Web Remote Desktop

Containerization & GPU

Docker, Kubernetes, Kubeflow, NVIDIA DGX-1 GPU systems

Databases

SQL Server (2000–2008), MySQL, Zope

High-Speed Networking

Infiniband, Mellanox, OFED, Voltaire, Force10

Proven experience in

HPC architecture and performance tuning

Cybersecurity in HPC/cloud environments

Infrastructure as Code (AWS, Terraform, Ansible, Packer)

Supporting scientific workflows in research environments

#J-18808-Ljbffr