Nextonic Solutions LLC
High-Performance Computing Systems Engineer
Nextonic Solutions LLC, Rockville, Maryland, us, 20849
Overview
Nextonic Solutions
is seeking a
High-Performance Computing (HPC) Systems Engineer
to join our vibrant team at the National Institutes of Health (NIH) supporting the
The National Center for Advancing Translational Sciences (NCATS)
located in
Rockville, MD.
The
High-Performance Computing (HPC) Systems Engineer
will support the
Scientific Computing and Informatics (SCI)
team at
The National Center for Advancing Translational Sciences (NCATS).
This role will focus on the design, optimization, security, and maintenance of HPC and cloud-based infrastructures that enable cutting-edge biomedical research through scalable, secure, and high-performing computing environments.
Responsibilities
Design, configure, and maintain scalable HPC clusters for optimal performance.
Support documentation and ATO (Authority to Operate) processes.
Ensure infrastructure design compliance with federal security standards and best practices.
Implement monitoring tools such as XDMoD for transparency and user reporting.
Integrate platforms such as JupyterHub and job schedulers (e.g., Slurm) for improved interactivity.
Develop and manage AWS-based infrastructure using Terraform, Packer, and Ansible.
Automate deployment workflows to streamline provisioning, updates, and scaling.
Manage systems involved in AWS Secure Cloud Bridging (SCB) and STRIDES initiatives.
Implement CIS benchmark-aligned system hardening using OpenSCAP.
Administer optimized compute images (CPU/GPU) for scientific workflows.
Leverage tools such as OpenHPC, Warewulf, and Ansible for environment management.
Lead and coordinate quarterly patch cycles.
Partner with researchers and external stakeholders on critical projects.
Facilitate solution transitions to other NIH centers and collaborators.
Contribute to publications and team objectives through deep technical engagement.
Qualifications
Federal ATO processes experience required
HPC architecture and performance optimization is required
Scientific software development and deployment
High-speed network and parallel file system architecture
Troubleshooting, diagnostics, and technical support
Strong communication and multitasking skills
Programming & Scripting
Languages - Pascal, BASIC, Delphi, Visual Basic, C, C++
Scripting - Bash, Perl, Python, Ruby, PEAR, Tcl
Systems & Network Administration
Linux – RHEL/CentOS, SUSE, Debian, Ubuntu
Windows – 95–10; NT–Server 2016
Networking – Active Directory, TCP/IP v4/v6, DHCP, DNS, WINS
Legacy – NOVELL 3.1–5, VPN, Citrix, Terminal Services
Monitoring & Management Tools
Nagios, Ganglia, HP BAC, Precise i3
SGI SMC, HP PCM, Bright Cluster Manager (incl. Data Analytics)
Infrastructure & Automation
Puppet, Cobbler, Ansible, Chef
Red Hat Satellite, Kickstart, RPM optimization
File Systems & Archiving
Panasas (DirectFlow/panfs), DDN (GPFS), SGI DMF, StorHouse/RFS (Filetek)
HPC Tools & Job Scheduling
MOAB/MAUI, Torque, PBS Pro, Windows HPC Scheduler
Visualization & Remote Access
Nice DCV, EnginFrame, VNC, OpenText Exceed OnDemand, Web Remote Desktop
Containerization & GPU
Docker, Kubernetes, Kubeflow, NVIDIA DGX-1 GPU systems
Databases
SQL Server (2000–2008), MySQL, Zope
High-Speed Networking
Infiniband, Mellanox, OFED, Voltaire, Force10
Proven experience in
HPC architecture and performance tuning
Cybersecurity in HPC/cloud environments
Infrastructure as Code (AWS, Terraform, Ansible, Packer)
Supporting scientific workflows in research environments
#J-18808-Ljbffr
is seeking a
High-Performance Computing (HPC) Systems Engineer
to join our vibrant team at the National Institutes of Health (NIH) supporting the
The National Center for Advancing Translational Sciences (NCATS)
located in
Rockville, MD.
The
High-Performance Computing (HPC) Systems Engineer
will support the
Scientific Computing and Informatics (SCI)
team at
The National Center for Advancing Translational Sciences (NCATS).
This role will focus on the design, optimization, security, and maintenance of HPC and cloud-based infrastructures that enable cutting-edge biomedical research through scalable, secure, and high-performing computing environments.
Responsibilities
Design, configure, and maintain scalable HPC clusters for optimal performance.
Support documentation and ATO (Authority to Operate) processes.
Ensure infrastructure design compliance with federal security standards and best practices.
Implement monitoring tools such as XDMoD for transparency and user reporting.
Integrate platforms such as JupyterHub and job schedulers (e.g., Slurm) for improved interactivity.
Develop and manage AWS-based infrastructure using Terraform, Packer, and Ansible.
Automate deployment workflows to streamline provisioning, updates, and scaling.
Manage systems involved in AWS Secure Cloud Bridging (SCB) and STRIDES initiatives.
Implement CIS benchmark-aligned system hardening using OpenSCAP.
Administer optimized compute images (CPU/GPU) for scientific workflows.
Leverage tools such as OpenHPC, Warewulf, and Ansible for environment management.
Lead and coordinate quarterly patch cycles.
Partner with researchers and external stakeholders on critical projects.
Facilitate solution transitions to other NIH centers and collaborators.
Contribute to publications and team objectives through deep technical engagement.
Qualifications
Federal ATO processes experience required
HPC architecture and performance optimization is required
Scientific software development and deployment
High-speed network and parallel file system architecture
Troubleshooting, diagnostics, and technical support
Strong communication and multitasking skills
Programming & Scripting
Languages - Pascal, BASIC, Delphi, Visual Basic, C, C++
Scripting - Bash, Perl, Python, Ruby, PEAR, Tcl
Systems & Network Administration
Linux – RHEL/CentOS, SUSE, Debian, Ubuntu
Windows – 95–10; NT–Server 2016
Networking – Active Directory, TCP/IP v4/v6, DHCP, DNS, WINS
Legacy – NOVELL 3.1–5, VPN, Citrix, Terminal Services
Monitoring & Management Tools
Nagios, Ganglia, HP BAC, Precise i3
SGI SMC, HP PCM, Bright Cluster Manager (incl. Data Analytics)
Infrastructure & Automation
Puppet, Cobbler, Ansible, Chef
Red Hat Satellite, Kickstart, RPM optimization
File Systems & Archiving
Panasas (DirectFlow/panfs), DDN (GPFS), SGI DMF, StorHouse/RFS (Filetek)
HPC Tools & Job Scheduling
MOAB/MAUI, Torque, PBS Pro, Windows HPC Scheduler
Visualization & Remote Access
Nice DCV, EnginFrame, VNC, OpenText Exceed OnDemand, Web Remote Desktop
Containerization & GPU
Docker, Kubernetes, Kubeflow, NVIDIA DGX-1 GPU systems
Databases
SQL Server (2000–2008), MySQL, Zope
High-Speed Networking
Infiniband, Mellanox, OFED, Voltaire, Force10
Proven experience in
HPC architecture and performance tuning
Cybersecurity in HPC/cloud environments
Infrastructure as Code (AWS, Terraform, Ansible, Packer)
Supporting scientific workflows in research environments
#J-18808-Ljbffr