Hewlett Packard Enterprise Company
Systems Analyst (/Site Reliability Engineer)
Hewlett Packard Enterprise Company, Clinton, Iowa, us, 52734
Overview
Systems Analyst (/Site Reliability Engineer) – This role is designed as onsite with an expectation to primarily work from an HPE partner/customer office.
Who We Are Hewlett Packard Enterprise is the global edge-to-cloud company advancing the way people live and work. We help companies connect, protect, analyze, and act on their data and applications wherever they live, from edge to cloud, so they can turn insights into outcomes at the speed required to thrive in today’s complex world. Our culture values varied backgrounds, flexibility, bold moves, and inclusivity.
Job Description Position Overview:
We are seeking a skilled Systems Analyst (/Site Reliability Engineer) at HPE to support Oak Ridge National Laboratory (ORNL). This is an onsite, customer-facing opportunity to work with advanced high-performance computing (HPC) systems, including Frontier. You will be involved in the deployment, maintenance, and optimization of large-scale computing software infrastructure and hardware to ensure system reliability for scientific research.
Responsibilities
Maintain and optimize compute infrastructure across multiple large-scale HPC systems.
Participate in the deployment, testing, and validation of live high-performance computing clusters.
Troubleshoot node failures by analyzing OS internals, compiler behavior, and system logs, coordinating with internal subject-matter experts as needed.
Conduct routine and on-demand maintenance, troubleshooting, and performance tuning for large-scale HPC environments.
Collaborate with researchers, engineers, and technical staff to open, maintain and close JIRA tickets to ensure system reliability and efficiency for high-stakes, high-performance scientific research.
Investigate and document complex software and system-level issues, acting as a bridge between users and HPE internal teams.
Develop and implement automation tools, scripts, and monitoring solutions to streamline system management.
Stay up-to-date with advancements in HPC technologies, including GPU acceleration (e.g., ROCm), parallel computation (Cray PE, MPI/OpenMP), and performance tuning.
Requirements
Due to the nature of the work, this position requires either U.S. Citizenship or U.S. Lawful Permanent Resident (LPR) status.
Bachelor’s in Computer Science, Computer Engineering, or a related field, with at least 2 years of experience, OR a Master’s in Computer Science or Computer Engineering of a related field.
HPC System Experience: Experience using SLURM-based HPC systems, both as a user and preferably as a system administrator.
Technical Skills: Proficient in Linux, Python, and Bash scripting. Familiarity with C++/Fortran-based HPC application development, GPUs, MPI, and high-performance computing tools.
Application Build and Configuration Knowledge: Strong understanding of application build processes, including compiler configurations, library integration, and dependency management, to effectively set up environments, perform upgrades, and troubleshoot build and runtime issues.
Log analysis: Experience in large-scale log analysis and troubleshooting performance, bugs or system failures.
Communication Skills: Strong written and verbal communication skills, with the ability to document and share knowledge effectively with internal teams and end-users.
Industry Knowledge: Familiarity with emerging HPC trends, system architectures, and optimization strategies.
What We Can Offer You Health & Wellbeing: We strive to provide a comprehensive benefits package that supports physical, financial, and emotional wellbeing.
Personal & Professional Development: We invest in your career with programs to help you reach your goals, whether you want to become a knowledge expert or apply your skills to another division.
Unconditional Inclusion: We are inclusive and value varied backgrounds. We offer flexibility to manage work and personal needs and strive to be a force for good.
Contact and Additional Information Let’s stay connected: Follow @HPECareers on Instagram for updates about people, culture, and tech at HPE. Equal Employment Opportunity statements apply as standard.
#J-18808-Ljbffr
Who We Are Hewlett Packard Enterprise is the global edge-to-cloud company advancing the way people live and work. We help companies connect, protect, analyze, and act on their data and applications wherever they live, from edge to cloud, so they can turn insights into outcomes at the speed required to thrive in today’s complex world. Our culture values varied backgrounds, flexibility, bold moves, and inclusivity.
Job Description Position Overview:
We are seeking a skilled Systems Analyst (/Site Reliability Engineer) at HPE to support Oak Ridge National Laboratory (ORNL). This is an onsite, customer-facing opportunity to work with advanced high-performance computing (HPC) systems, including Frontier. You will be involved in the deployment, maintenance, and optimization of large-scale computing software infrastructure and hardware to ensure system reliability for scientific research.
Responsibilities
Maintain and optimize compute infrastructure across multiple large-scale HPC systems.
Participate in the deployment, testing, and validation of live high-performance computing clusters.
Troubleshoot node failures by analyzing OS internals, compiler behavior, and system logs, coordinating with internal subject-matter experts as needed.
Conduct routine and on-demand maintenance, troubleshooting, and performance tuning for large-scale HPC environments.
Collaborate with researchers, engineers, and technical staff to open, maintain and close JIRA tickets to ensure system reliability and efficiency for high-stakes, high-performance scientific research.
Investigate and document complex software and system-level issues, acting as a bridge between users and HPE internal teams.
Develop and implement automation tools, scripts, and monitoring solutions to streamline system management.
Stay up-to-date with advancements in HPC technologies, including GPU acceleration (e.g., ROCm), parallel computation (Cray PE, MPI/OpenMP), and performance tuning.
Requirements
Due to the nature of the work, this position requires either U.S. Citizenship or U.S. Lawful Permanent Resident (LPR) status.
Bachelor’s in Computer Science, Computer Engineering, or a related field, with at least 2 years of experience, OR a Master’s in Computer Science or Computer Engineering of a related field.
HPC System Experience: Experience using SLURM-based HPC systems, both as a user and preferably as a system administrator.
Technical Skills: Proficient in Linux, Python, and Bash scripting. Familiarity with C++/Fortran-based HPC application development, GPUs, MPI, and high-performance computing tools.
Application Build and Configuration Knowledge: Strong understanding of application build processes, including compiler configurations, library integration, and dependency management, to effectively set up environments, perform upgrades, and troubleshoot build and runtime issues.
Log analysis: Experience in large-scale log analysis and troubleshooting performance, bugs or system failures.
Communication Skills: Strong written and verbal communication skills, with the ability to document and share knowledge effectively with internal teams and end-users.
Industry Knowledge: Familiarity with emerging HPC trends, system architectures, and optimization strategies.
What We Can Offer You Health & Wellbeing: We strive to provide a comprehensive benefits package that supports physical, financial, and emotional wellbeing.
Personal & Professional Development: We invest in your career with programs to help you reach your goals, whether you want to become a knowledge expert or apply your skills to another division.
Unconditional Inclusion: We are inclusive and value varied backgrounds. We offer flexibility to manage work and personal needs and strive to be a force for good.
Contact and Additional Information Let’s stay connected: Follow @HPECareers on Instagram for updates about people, culture, and tech at HPE. Equal Employment Opportunity statements apply as standard.
#J-18808-Ljbffr