Cornelis Networks delivers the world's highest performance scale-out networking solutions for AI and HPC datacenters. Our differentiated architecture seamlessly integrates hardware, software and system level technologies to maximize the efficiency of GPU, CPU and accelerator-based compute clusters at any scale. Our solutions drive breakthroughs in AI & HPC workloads, empowering our customers to push the boundaries of innovation. Backed by top-tier venture capital and strategic investors, we are committed to innovation, performance and scalability - solving the world's most demanding computational challenges with our next-generation networking solutions.
We are seeking an experienced Senior Linux Infrastructure Manager to lead and manage our engineering computing environment. This hands-on leadership role requires deep technical expertise in Linux systems administration and HPC infrastructure management to support our ASIC development, platform and software engineering teams. The successful candidate will be responsible for the stability, scalability, and performance of compute, storage, and platform infrastructure. The ideal candidate combines technical depth with strong leadership skills and a passion for operational excellence.
Key Responsibilities
- Design, implement, and manage a Linux-based HPC environment with 200+ compute nodes
- Oversee the administration of batch compute systems including SLURM or LSF for optimal workload management
- Manage and optimize NFS systems and storage infrastructure to support engineering workflows
- Oversee observability systems (monitoring, logging, alerting) and drive continuous improvements in automation and root-cause analysis
- Drive adoption of "Infrastructure as Code" and automated workflows to reduce manual intervention
- Implement and enforce best practices for system availability, performance tuning, capacity planning, and lifecycle management
- Ensure high availability and performance of critical infrastructure services including VNC, NFS, license servers and GitHub
- Collaborate with engineering teams to understand compute requirements and optimize infrastructure accordingly
- Lead capacity planning and infrastructure expansion initiatives
- Manage resources responsible for the on-prem hardware installation, maintenance, and monitoring
- Drive adoption of AI within the infrastructure team and workflows
- Bachelor's degree in Computer Science, Engineering, or related field (Master's preferred)
- Minimum 10 years of experience in Linux systems administration with focus on HPC environments
- Deep expertise with HPC workload managers (SLURM or LSF)
- Strong knowledge of NFS and distributed storage systems
- Experience implementing and managing monitoring solutions for large-scale computing environments
- Proficiency with infrastructure automation tools and scripting languages (Python, Bash, etc.)
- Strong troubleshooting and problem-solving skills and leadership abilities
- Hands-on technical expertise to be able to drive issue rootcause analysis and remediations
- Experience with ASIC and software development workflows, EDA tools and software development environments.
- Experience with Ansible or similar tools for deploying applications or orchestration of workflows
- Experience with CI/CD pipelines and DevOps practices
- Familiarity with containerization technologies (Docker, Singularity)
- Experience with performance tuning and optimization of HPC workloads
- Experience with installation and maintenance of locally hosted LLMs for AI training/inference.
- Experience with cloud based infrastructure.
Location: This is a remote position for employees residing within the United States.
We offer a competitive compensation package that includes equity, cash, and incentives, along with health and retirement benefits. Our dynamic, flexible work environment provides the opportunity to collaborate with some of the most influential names in the semiconductor industry.
At Cornelis Networks your base salary is only one component of your comprehensive total rewards package. Your base pay will be determined by factors such as your skills, qualifications, experience, and location relative to the hiring range for the position. Depending on your role, you may also be eligible for performance-based incentives, including an annual bonus or sales incentives.
In addition to your base pay, you'll have access to a broad range of benefits, including medical, dental, and vision coverage, as well as disability and life insurance, a dependent care flexible spending account, accidental injury insurance, and pet insurance. We also offer generous paid holidays, 401(k) with company match, and Open Time Off (OTO) for regular full-time exempt employees. Other paid time off benefits include sick time, bonding leave, and pregnancy disability leave.
Cornelis Networks does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. Cornelis Networks is an equal opportunity employer, and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity or expression, pregnancy, age, national origin, disability status, genetic information, protected veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants' needs under the respective laws throughout all stages of the recruitment and selection process.