Optimal CAE
We are seeking a highly skilled GPU Cluster Software Engineer with strong expertise in VMware and CPU/GPU cluster technologies. This engineer will play a critical role in designing, implementing, and managing high-performance compute clusters that support advanced workloads including AI, ML, and HPC applications.
Key Responsibilities
Design, deploy, and manage enterprise-scale CPU/GPU clusters for high-performance workloads. Configure, maintain, and optimize VMware virtualization platforms (vSphere, ESXi, vCenter, vSAN). Integrate GPU virtualization technologies (e.g., NVIDIA GRID, vGPU) into VMware environments. Perform performance tuning, capacity planning, and resource optimization for compute clusters. Implement automation and orchestration tools to streamline cluster operations and provisioning. Monitor, troubleshoot, and optimize cluster performance to ensure system reliability. Collaborate with research and engineering teams to support compute-intensive applications (AI/ML/HPC). Ensure system scalability, security, and efficiency across multi-user environments. Required Skills & Qualifications
Hands-on expertise with VMware virtualization technologies (vSphere, ESXi, vCenter, vSAN). Proven experience in building and managing CPU/GPU clusters in enterprise or research environments. Strong knowledge of GPU virtualization (NVIDIA GRID, vGPU) and integration with VMware. Proficiency in cluster monitoring, troubleshooting, and optimization. Solid understanding of networking and storage concepts in clustered environments. Experience supporting compute-intensive workloads such as AI, ML, or HPC. Familiarity with automation/orchestration tools (e.g., Ansible, Terraform, Kubernetes, or similar). Excellent problem-solving skills and ability to work in a fast-paced, collaborative environment. Education
Master's or Ph.D. in Computer Science, Computer Engineering, or related field.
Key Responsibilities
Design, deploy, and manage enterprise-scale CPU/GPU clusters for high-performance workloads. Configure, maintain, and optimize VMware virtualization platforms (vSphere, ESXi, vCenter, vSAN). Integrate GPU virtualization technologies (e.g., NVIDIA GRID, vGPU) into VMware environments. Perform performance tuning, capacity planning, and resource optimization for compute clusters. Implement automation and orchestration tools to streamline cluster operations and provisioning. Monitor, troubleshoot, and optimize cluster performance to ensure system reliability. Collaborate with research and engineering teams to support compute-intensive applications (AI/ML/HPC). Ensure system scalability, security, and efficiency across multi-user environments. Required Skills & Qualifications
Hands-on expertise with VMware virtualization technologies (vSphere, ESXi, vCenter, vSAN). Proven experience in building and managing CPU/GPU clusters in enterprise or research environments. Strong knowledge of GPU virtualization (NVIDIA GRID, vGPU) and integration with VMware. Proficiency in cluster monitoring, troubleshooting, and optimization. Solid understanding of networking and storage concepts in clustered environments. Experience supporting compute-intensive workloads such as AI, ML, or HPC. Familiarity with automation/orchestration tools (e.g., Ansible, Terraform, Kubernetes, or similar). Excellent problem-solving skills and ability to work in a fast-paced, collaborative environment. Education
Master's or Ph.D. in Computer Science, Computer Engineering, or related field.