Logo
Optimal CAE

GPU Cluster Software Engineer

Optimal CAE, Warren, Michigan, United States, 48091

Save Job

We are seeking a highly skilled GPU Cluster Software Engineer with strong expertise in VMware and CPU/GPU cluster technologies. This engineer will play a critical role in designing, implementing, and managing high-performance compute clusters that support advanced workloads including AI, ML, and HPC applications.

Key Responsibilities

Design, deploy, and manage enterprise-scale CPU/GPU clusters for high-performance workloads. Configure, maintain, and optimize VMware virtualization platforms (vSphere, ESXi, vCenter, vSAN). Integrate GPU virtualization technologies (e.g., NVIDIA GRID, vGPU) into VMware environments. Perform performance tuning, capacity planning, and resource optimization for compute clusters. Implement automation and orchestration tools to streamline cluster operations and provisioning. Monitor, troubleshoot, and optimize cluster performance to ensure system reliability. Collaborate with research and engineering teams to support compute-intensive applications (AI/ML/HPC). Ensure system scalability, security, and efficiency across multi-user environments. Required Skills & Qualifications

Hands-on expertise with VMware virtualization technologies (vSphere, ESXi, vCenter, vSAN). Proven experience in building and managing CPU/GPU clusters in enterprise or research environments. Strong knowledge of GPU virtualization (NVIDIA GRID, vGPU) and integration with VMware. Proficiency in cluster monitoring, troubleshooting, and optimization. Solid understanding of networking and storage concepts in clustered environments. Experience supporting compute-intensive workloads such as AI, ML, or HPC. Familiarity with automation/orchestration tools (e.g., Ansible, Terraform, Kubernetes, or similar). Excellent problem-solving skills and ability to work in a fast-paced, collaborative environment. Education

Master's or Ph.D. in Computer Science, Computer Engineering, or related field.