Berkeley Lab
NERSC is seeking a versatile Linux System / Platform Engineer to build and manage Linux-based infrastructure for the world’s largest supercomputers. In this role you will develop container and virtual machine platforms, deploy systems that keep the supercomputing center running smoothly, and enable researchers to leverage scientific research tools, authentication, identity and access management, databases, and more.
What You Will Do (Level 3)
Work with a team to build and manage Linux systems and storage infrastructure.
Troubleshoot and solve complex technical problems with other team members.
Install, upgrade, and secure equipment and services.
Develop and refactor scripts and other code.
Participate in 24x7 on‑call rotation.
Coordinate small project teams or other initiatives such as the rollout of a new service or system, or a major equipment or software upgrade.
Work with vendors to prioritize efforts and enhance their technologies to meet user needs.
Work with researchers to deploy services using Spin, our container cloud platform based on Kubernetes.
Collaborate within NERSC and across the DOE community to develop services, integrate them into the new NERSC supercomputer Doudna, the NERSC data center environment, and across multiple DOE facilities.
Present developments to NERSC staff and the broader HPC community at science conferences and industry meetings.
Additional Responsibilities (Level 4)
Analyze and solve complex technical problems requiring in‑depth evaluation of variable factors.
Work at a higher level of independence while carrying out work assignments.
Research, select, and lead the implementation of new technologies.
Develop team strategy and project plans.
Provide leadership and technical guidance to group members and other colleagues at NERSC.
Recommend and lead system improvement efforts that enhance system performance, reliability, and security.
Identify and evaluate emerging HPC technologies and features that could introduce novel capabilities or enhance existing system performance and utility.
Represent NERSC in technical or user advocacy groups to influence the HPC and DOE community to meet user needs.
Qualifications (Level 3)
Typically, 8+ years of related experience with a Bachelor’s degree; alternatively, 6+ years with a Master’s degree; or equivalent career experience.
4+ years of experience managing large‑scale Linux‑based system deployments in a high‑performance computing, cloud computing, or hyper‑scale environment.
Experience with some or all of our key technologies:
containers (such as Docker or Kubernetes)
virtualization (such as Proxmox or VMware)
cloud‑based deployment (such as AWS, Azure or GCP)
Using and developing AI (or machine learning) tools and services
identity and access management
database administration, tuning, and troubleshooting
networked storage systems
backup technologies
Familiarity with automated provisioning systems (such as Chef, Foreman, or Terraform).
Familiarity with configuration management systems (such as Ansible or Puppet).
Working knowledge of Linux system engineering and security practices.
Ability to resolve complex issues in creative and effective ways and derive technical solutions in a collaborative environment to meet end‑user requirements.
Demonstrated ability to work independently as well as collaboratively in large projects, and contribute to an active and respectful intellectual environment.
Creative, positive, and collaborative work style.
Excellent oral and written communication skills.
Additional Requirements (Level 4)
Typically, 12+ years of related experience with a Bachelor’s degree; alternatively, 8+ years with a Master’s degree; or equivalent career experience.
Experience in software engineering or complex scripting.
Experience managing network equipment.
Ability to lead and coordinate projects.
Ability to analyze and resolve significant and unique issues requiring evaluation of multiple intangible factors.
Ability to exercise independent judgment in methods, techniques and evaluation criteria for obtaining results.
Notes
This is a full‑time, career appointment, exempt (monthly paid) from overtime pay.
This position will be hired at a level commensurate with the business needs and the skills, knowledge, and abilities of the successful candidate.
In‑person interviews will consist of standard question and answer sessions and a presentation on a technical topic.
Level 3: The full salary range is $136,440 to $230,244 per year, with a targeted range of $153,492 to $187,596 per year.
Level 4: The full salary range is $155,388 to $262,224 per year, with a targeted range of $174,804 to $213,660 per year.
This position is subject to a background check. Any convictions will be evaluated to determine if they directly relate to the responsibilities and requirements of the position. Having a conviction history will not automatically disqualify an applicant from being considered for employment.
This position requires substantial on‑site presence. A hybrid work mode is available; individuals must reside within 150 miles of Berkeley Lab. Telework or remote work may be considered in rare cases. A REAL ID or other acceptable form of identification is required to access Berkeley Lab sites.
Equal Employment Opportunity Employer: Berkeley Lab is an Equal Opportunity Employer. We heartily welcome applications from all who could contribute to the Lab’s mission of leading scientific discovery, excellence, and professionalism. In support of our rich global community, all qualified applicants will be considered for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, age, protected veteran status, or other protected categories under State and Federal law.
#J-18808-Ljbffr
What You Will Do (Level 3)
Work with a team to build and manage Linux systems and storage infrastructure.
Troubleshoot and solve complex technical problems with other team members.
Install, upgrade, and secure equipment and services.
Develop and refactor scripts and other code.
Participate in 24x7 on‑call rotation.
Coordinate small project teams or other initiatives such as the rollout of a new service or system, or a major equipment or software upgrade.
Work with vendors to prioritize efforts and enhance their technologies to meet user needs.
Work with researchers to deploy services using Spin, our container cloud platform based on Kubernetes.
Collaborate within NERSC and across the DOE community to develop services, integrate them into the new NERSC supercomputer Doudna, the NERSC data center environment, and across multiple DOE facilities.
Present developments to NERSC staff and the broader HPC community at science conferences and industry meetings.
Additional Responsibilities (Level 4)
Analyze and solve complex technical problems requiring in‑depth evaluation of variable factors.
Work at a higher level of independence while carrying out work assignments.
Research, select, and lead the implementation of new technologies.
Develop team strategy and project plans.
Provide leadership and technical guidance to group members and other colleagues at NERSC.
Recommend and lead system improvement efforts that enhance system performance, reliability, and security.
Identify and evaluate emerging HPC technologies and features that could introduce novel capabilities or enhance existing system performance and utility.
Represent NERSC in technical or user advocacy groups to influence the HPC and DOE community to meet user needs.
Qualifications (Level 3)
Typically, 8+ years of related experience with a Bachelor’s degree; alternatively, 6+ years with a Master’s degree; or equivalent career experience.
4+ years of experience managing large‑scale Linux‑based system deployments in a high‑performance computing, cloud computing, or hyper‑scale environment.
Experience with some or all of our key technologies:
containers (such as Docker or Kubernetes)
virtualization (such as Proxmox or VMware)
cloud‑based deployment (such as AWS, Azure or GCP)
Using and developing AI (or machine learning) tools and services
identity and access management
database administration, tuning, and troubleshooting
networked storage systems
backup technologies
Familiarity with automated provisioning systems (such as Chef, Foreman, or Terraform).
Familiarity with configuration management systems (such as Ansible or Puppet).
Working knowledge of Linux system engineering and security practices.
Ability to resolve complex issues in creative and effective ways and derive technical solutions in a collaborative environment to meet end‑user requirements.
Demonstrated ability to work independently as well as collaboratively in large projects, and contribute to an active and respectful intellectual environment.
Creative, positive, and collaborative work style.
Excellent oral and written communication skills.
Additional Requirements (Level 4)
Typically, 12+ years of related experience with a Bachelor’s degree; alternatively, 8+ years with a Master’s degree; or equivalent career experience.
Experience in software engineering or complex scripting.
Experience managing network equipment.
Ability to lead and coordinate projects.
Ability to analyze and resolve significant and unique issues requiring evaluation of multiple intangible factors.
Ability to exercise independent judgment in methods, techniques and evaluation criteria for obtaining results.
Notes
This is a full‑time, career appointment, exempt (monthly paid) from overtime pay.
This position will be hired at a level commensurate with the business needs and the skills, knowledge, and abilities of the successful candidate.
In‑person interviews will consist of standard question and answer sessions and a presentation on a technical topic.
Level 3: The full salary range is $136,440 to $230,244 per year, with a targeted range of $153,492 to $187,596 per year.
Level 4: The full salary range is $155,388 to $262,224 per year, with a targeted range of $174,804 to $213,660 per year.
This position is subject to a background check. Any convictions will be evaluated to determine if they directly relate to the responsibilities and requirements of the position. Having a conviction history will not automatically disqualify an applicant from being considered for employment.
This position requires substantial on‑site presence. A hybrid work mode is available; individuals must reside within 150 miles of Berkeley Lab. Telework or remote work may be considered in rare cases. A REAL ID or other acceptable form of identification is required to access Berkeley Lab sites.
Equal Employment Opportunity Employer: Berkeley Lab is an Equal Opportunity Employer. We heartily welcome applications from all who could contribute to the Lab’s mission of leading scientific discovery, excellence, and professionalism. In support of our rich global community, all qualified applicants will be considered for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, age, protected veteran status, or other protected categories under State and Federal law.
#J-18808-Ljbffr