Supermicro
Sr. Solution Engineer - HPC & AI Systems
Supermicro, San Jose, California, United States, 95131
Job Req ID: 27728
About Supermicro:
Supermicro is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop/ Big Data, Hyperscale, HPC and IoT/Embedded customers worldwide. We are the #5 fastest growing company among the Silicon Valley Top 50 technology firms. Our unprecedented global expansion has provided us with the opportunity to offer a large number of new positions to the technology community. We seek talented, passionate, and committed engineers, technologists, and business leaders to join us.
Job Summary: We are seeking a highly skilled and motivated Senior Solution Engineer to lead efforts in benchmarking, performance tuning, and platform automation for HPC and AI workloads. This role is critical to ensuring our systems meet performance targets for RFQs and support scalable, automated deployment across diverse environments. Essential Duties and Responsibilities: Includes the following essential duties and responsibilities (other duties may also be assigned): * HPC & AI Benchmarking o Execute performance benchmarks for HPC and AI workloads, including MLPerf, across various GPU systems. o Analyze and optimize system configurations to meet RFQ performance requirements. * Performance Tuning & Cluster Optimization o Identify bottlenecks and implement tuning strategies for compute, memory, and I/O performance. o Scale and maintain large compute clusters for high-demand workloads. * Platform Automation & Infrastructure Engineering o Automate deployment and configuration using Ansible, Terraform, and Docker. o Develop infrastructure-as-code solutions to support reproducible and scalable environments. * Backend & Middleware Development o Build backend services and middleware to support large-scale distributed deployments. o Ensure reliability, modularity, and performance of backend systems. * CI/CD & DevOps Integration o Design and maintain CI/CD pipelines to support agile development and minimize downtime. o Collaborate with DevOps teams to streamline software delivery and system updates. * System Administration & OS Engineering o Perform hands-on installation, tuning, and troubleshooting of Linux systems, especially Red Hat-based environments. o Manage software-defined storage and networking components including DNS, DHCP, PXE, and cluster provisioning. o Maintain and configure Proof-of-Concept (PoC) system components. o Support the system certification processes required by ISV partners. * Containerization & Orchestration o Deploy and manage containerized workloads using Kubernetes. o Integrate container orchestration with benchmarking and automation workflows. * Documentation & Collaboration o Maintain clear and comprehensive technical documentation. o Work closely with cross-functional teams including hardware, QA, and product management. Qualifications: * Bachelor or Master degree in Computer Science or a related field * Minimum 8 years of professional experiences with Python and Shell script * Proven experience with HPC and AI benchmarking tools (e.g., MLPerf). * Strong proficiency in Ansible, Terraform, Docker, and Python. * Hands-on experience configuring and troubleshoot Linux OS, servers and network switches. * Solid understanding of software-defined storage, networking (DNS, DHCP, PXE), system provisioning and state-of-the-art datacenter operations. * Excellent problem-solving, documentation, and collaboration skills. * Ability to work independently and lead technical initiatives. Salary Range $170,000 - $190,000
The salary offered will depend on several factors, including your location, level, education, training, specific skills, years of experience, and comparison to other employees already in this role. In addition to a comprehensive benefits package, candidates may be eligible for other forms of compensation, such as participation in bonus and equity award programs. EEO Statement Supermicro is an Equal Opportunity Employer and embraces diversity in our employee population. It is the policy of Supermicro to provide equal opportunity to all qualified applicants and employees without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, protected veteran status or special disabled veteran, marital status, pregnancy, genetic information, or any other legally protected status.
Job Summary: We are seeking a highly skilled and motivated Senior Solution Engineer to lead efforts in benchmarking, performance tuning, and platform automation for HPC and AI workloads. This role is critical to ensuring our systems meet performance targets for RFQs and support scalable, automated deployment across diverse environments. Essential Duties and Responsibilities: Includes the following essential duties and responsibilities (other duties may also be assigned): * HPC & AI Benchmarking o Execute performance benchmarks for HPC and AI workloads, including MLPerf, across various GPU systems. o Analyze and optimize system configurations to meet RFQ performance requirements. * Performance Tuning & Cluster Optimization o Identify bottlenecks and implement tuning strategies for compute, memory, and I/O performance. o Scale and maintain large compute clusters for high-demand workloads. * Platform Automation & Infrastructure Engineering o Automate deployment and configuration using Ansible, Terraform, and Docker. o Develop infrastructure-as-code solutions to support reproducible and scalable environments. * Backend & Middleware Development o Build backend services and middleware to support large-scale distributed deployments. o Ensure reliability, modularity, and performance of backend systems. * CI/CD & DevOps Integration o Design and maintain CI/CD pipelines to support agile development and minimize downtime. o Collaborate with DevOps teams to streamline software delivery and system updates. * System Administration & OS Engineering o Perform hands-on installation, tuning, and troubleshooting of Linux systems, especially Red Hat-based environments. o Manage software-defined storage and networking components including DNS, DHCP, PXE, and cluster provisioning. o Maintain and configure Proof-of-Concept (PoC) system components. o Support the system certification processes required by ISV partners. * Containerization & Orchestration o Deploy and manage containerized workloads using Kubernetes. o Integrate container orchestration with benchmarking and automation workflows. * Documentation & Collaboration o Maintain clear and comprehensive technical documentation. o Work closely with cross-functional teams including hardware, QA, and product management. Qualifications: * Bachelor or Master degree in Computer Science or a related field * Minimum 8 years of professional experiences with Python and Shell script * Proven experience with HPC and AI benchmarking tools (e.g., MLPerf). * Strong proficiency in Ansible, Terraform, Docker, and Python. * Hands-on experience configuring and troubleshoot Linux OS, servers and network switches. * Solid understanding of software-defined storage, networking (DNS, DHCP, PXE), system provisioning and state-of-the-art datacenter operations. * Excellent problem-solving, documentation, and collaboration skills. * Ability to work independently and lead technical initiatives. Salary Range $170,000 - $190,000
The salary offered will depend on several factors, including your location, level, education, training, specific skills, years of experience, and comparison to other employees already in this role. In addition to a comprehensive benefits package, candidates may be eligible for other forms of compensation, such as participation in bonus and equity award programs. EEO Statement Supermicro is an Equal Opportunity Employer and embraces diversity in our employee population. It is the policy of Supermicro to provide equal opportunity to all qualified applicants and employees without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, protected veteran status or special disabled veteran, marital status, pregnancy, genetic information, or any other legally protected status.