GenBio AI
High Performance Computing (HPC) Engineer
GenBio AI, Palo Alto, California, United States, 94306
High Performance Computing (HPC) Engineer
Join to apply for the
High Performance Computing (HPC) Engineer
role at
GenBio AI High Performance Computing (HPC) Engineer
Join to apply for the
High Performance Computing (HPC) Engineer
role at
GenBio AI Headquartered in Silicon Valley, we are a newly established start-up, where a collective of visionary scientists, engineers, and entrepreneurs are dedicated to transforming the landscape of biology and medicine through the power of Generative AI. Our team comprises leading minds and innovators in AI and Biological Science, pushing the boundaries of what is possible. We are dreamers who reimagine a new paradigm for biology and medicine.
We are committed to decoding biology holistically and enabling the next generation of life-transforming solutions. As the first mover in pan-modal Large Biological Models (LBM), we are pioneering a new era of biomedicine, with our LBM training leading to ground-breaking advancements and a transformative approach to healthcare. Our exceptionally strong R&D team and leadership in LLM and generative AI position us at the forefront of this revolutionary field. With headquarters in Silicon Valley, California, and a branch office in Paris, we are poised to make a global impact. Join us as we embark on this journey to redefine the future of biology and medicine through the transformative power of Generative AI.
Job Description
GPU Cluster Management: Design, deploy, and maintain high-performance GPU clusters, ensuring their stability, reliability, and scalability. Monitor and manage cluster resources to maximize utilization and efficiency Distributed/Parallel Training: Implement distributed computing techniques to enable parallel training of large deep learning models across multiple GPUs and nodes. Optimize data distribution and synchronization to achieve faster convergence and reduced training times Performance Optimization: Fine-tune GPU clusters and deep learning frameworks to achieve optimal performance for specific workloads. Identify and resolve performance bottlenecks through profiling and system analysis Deep Learning Framework Integration: Collaborate with data scientists and machine learning engineers to integrate distributed training capabilities into GenBio AI’s model development and deployment frameworks. Scalability and Resource Management: Ensure that the GPU clusters can scale effectively to handle increasing computational demands. Develop resource management strategies to prioritize and allocate computing resources based on project requirements. Troubleshooting and Support: Troubleshoot and resolve issues related to GPU clusters, distributed training, and performance anomalies. Provide technical support to users and resolve technical challenges efficiently Documentation: Create and maintain documentation related to GPU cluster configuration, distributed training workflows, and best practices to ensure knowledge sharing and seamless onboarding of new team members
Job Requirements:
Master’s or Ph.D. degree in computer science, or a related field with a focus on High-Performance Computing, Distributed Systems, or Deep Learning 2+ years proven experience in managing GPU clusters, including installation, configuration, and optimization Strong expertise in distributed deep learning and parallel training techniques Proficiency in popular deep learning frameworks like PyTorch, Megatron-LM, DeepSpeed, etc Programming skills in Python and experience with GPU-accelerated libraries (e.g., CUDA, cuDNN) Knowledge of performance profiling and optimization tools for HPC and deep learning Familiarity with resource management and scheduling systems (e.g., SLURM, Kubernetes) Strong background in distributed systems, cloud computing (AWS, GCP), and containerization (Docker, Kubernetes)
Join us as we embark on this journey to redefine the future of biology and medicine.
We are an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. Seniority level
Seniority level Not Applicable Employment type
Employment type Full-time Job function
Job function Human Resources Industries Software Development Referrals increase your chances of interviewing at GenBio AI by 2x Get notified about new Performance Specialist jobs in
Palo Alto, CA . Mountain View, CA $107,500.00-$165,500.00 3 weeks ago Technical Specialist, Efficiency and Performance
Associate, Performance Reporting & Analytics
Foster City, CA $70,000.00-$97,500.00 4 weeks ago Mountain View, CA $95,000.00-$120,000.00 1 month ago Observability Customer Success Specialist, AWS Specialist and Partner Organization
Customer Experience Performance Marketing Mgr
Mountain View, CA $38.25-$51.75 1 week ago Observability Customer Success Specialist, AWS Specialist and Partner Organization
Observability Customer Success Specialist, AWS Specialist and Partner Organization
Observability Customer Success Specialist, AWS Specialist and Partner Organization
Senior Business Systems Analyst - Workday Performance and Talent
Mountain View, CA $124,500.00-$182,500.00 1 week ago San Mateo, CA $145,000.00-$190,000.00 1 week ago Redwood City, CA $105,000.00-$125,000.00 2 days ago Sunnyvale, CA $65,000.00-$105,000.00 11 hours ago Santa Clara, CA $80,420.00-$750,003.00 1 day ago Performance & Incentive Project Manager & Analytics - San Jose
San Jose, CA $102,600.00-$192,000.00 2 weeks ago Senior Performance DevOps / Performance DevOps Senior
San Jose, CA $94,700.00-$270,000.00 3 weeks ago Product Excellence Manager, gTech Ads Enablement, Performance
Sunnyvale, CA $118,000.00-$170,000.00 2 days ago Senior Manager, Data, Performance & Reporting
Foster City, CA $157,590.00-$203,940.00 2 weeks ago Brisbane, CA $69,000.00-$105,000.00 4 days ago Implementation Specialist, Client Operations
Leader of Portfolio Management and Performance
County Executive's Office - Management Analyst, Office of Budget, Policy & Performance
San Mateo County, CA $115,481.60-$144,393.60 2 weeks ago Product Manager - App Performance & Reliability, Creative Tools
San Jose, CA $149,040.00-$311,600.00 2 days ago Senior Growth and Performance Marketing Manager
Mountain View, CA $162,000.00-$198,000.00 1 week ago We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
#J-18808-Ljbffr
Join to apply for the
High Performance Computing (HPC) Engineer
role at
GenBio AI High Performance Computing (HPC) Engineer
Join to apply for the
High Performance Computing (HPC) Engineer
role at
GenBio AI Headquartered in Silicon Valley, we are a newly established start-up, where a collective of visionary scientists, engineers, and entrepreneurs are dedicated to transforming the landscape of biology and medicine through the power of Generative AI. Our team comprises leading minds and innovators in AI and Biological Science, pushing the boundaries of what is possible. We are dreamers who reimagine a new paradigm for biology and medicine.
We are committed to decoding biology holistically and enabling the next generation of life-transforming solutions. As the first mover in pan-modal Large Biological Models (LBM), we are pioneering a new era of biomedicine, with our LBM training leading to ground-breaking advancements and a transformative approach to healthcare. Our exceptionally strong R&D team and leadership in LLM and generative AI position us at the forefront of this revolutionary field. With headquarters in Silicon Valley, California, and a branch office in Paris, we are poised to make a global impact. Join us as we embark on this journey to redefine the future of biology and medicine through the transformative power of Generative AI.
Job Description
GPU Cluster Management: Design, deploy, and maintain high-performance GPU clusters, ensuring their stability, reliability, and scalability. Monitor and manage cluster resources to maximize utilization and efficiency Distributed/Parallel Training: Implement distributed computing techniques to enable parallel training of large deep learning models across multiple GPUs and nodes. Optimize data distribution and synchronization to achieve faster convergence and reduced training times Performance Optimization: Fine-tune GPU clusters and deep learning frameworks to achieve optimal performance for specific workloads. Identify and resolve performance bottlenecks through profiling and system analysis Deep Learning Framework Integration: Collaborate with data scientists and machine learning engineers to integrate distributed training capabilities into GenBio AI’s model development and deployment frameworks. Scalability and Resource Management: Ensure that the GPU clusters can scale effectively to handle increasing computational demands. Develop resource management strategies to prioritize and allocate computing resources based on project requirements. Troubleshooting and Support: Troubleshoot and resolve issues related to GPU clusters, distributed training, and performance anomalies. Provide technical support to users and resolve technical challenges efficiently Documentation: Create and maintain documentation related to GPU cluster configuration, distributed training workflows, and best practices to ensure knowledge sharing and seamless onboarding of new team members
Job Requirements:
Master’s or Ph.D. degree in computer science, or a related field with a focus on High-Performance Computing, Distributed Systems, or Deep Learning 2+ years proven experience in managing GPU clusters, including installation, configuration, and optimization Strong expertise in distributed deep learning and parallel training techniques Proficiency in popular deep learning frameworks like PyTorch, Megatron-LM, DeepSpeed, etc Programming skills in Python and experience with GPU-accelerated libraries (e.g., CUDA, cuDNN) Knowledge of performance profiling and optimization tools for HPC and deep learning Familiarity with resource management and scheduling systems (e.g., SLURM, Kubernetes) Strong background in distributed systems, cloud computing (AWS, GCP), and containerization (Docker, Kubernetes)
Join us as we embark on this journey to redefine the future of biology and medicine.
We are an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. Seniority level
Seniority level Not Applicable Employment type
Employment type Full-time Job function
Job function Human Resources Industries Software Development Referrals increase your chances of interviewing at GenBio AI by 2x Get notified about new Performance Specialist jobs in
Palo Alto, CA . Mountain View, CA $107,500.00-$165,500.00 3 weeks ago Technical Specialist, Efficiency and Performance
Associate, Performance Reporting & Analytics
Foster City, CA $70,000.00-$97,500.00 4 weeks ago Mountain View, CA $95,000.00-$120,000.00 1 month ago Observability Customer Success Specialist, AWS Specialist and Partner Organization
Customer Experience Performance Marketing Mgr
Mountain View, CA $38.25-$51.75 1 week ago Observability Customer Success Specialist, AWS Specialist and Partner Organization
Observability Customer Success Specialist, AWS Specialist and Partner Organization
Observability Customer Success Specialist, AWS Specialist and Partner Organization
Senior Business Systems Analyst - Workday Performance and Talent
Mountain View, CA $124,500.00-$182,500.00 1 week ago San Mateo, CA $145,000.00-$190,000.00 1 week ago Redwood City, CA $105,000.00-$125,000.00 2 days ago Sunnyvale, CA $65,000.00-$105,000.00 11 hours ago Santa Clara, CA $80,420.00-$750,003.00 1 day ago Performance & Incentive Project Manager & Analytics - San Jose
San Jose, CA $102,600.00-$192,000.00 2 weeks ago Senior Performance DevOps / Performance DevOps Senior
San Jose, CA $94,700.00-$270,000.00 3 weeks ago Product Excellence Manager, gTech Ads Enablement, Performance
Sunnyvale, CA $118,000.00-$170,000.00 2 days ago Senior Manager, Data, Performance & Reporting
Foster City, CA $157,590.00-$203,940.00 2 weeks ago Brisbane, CA $69,000.00-$105,000.00 4 days ago Implementation Specialist, Client Operations
Leader of Portfolio Management and Performance
County Executive's Office - Management Analyst, Office of Budget, Policy & Performance
San Mateo County, CA $115,481.60-$144,393.60 2 weeks ago Product Manager - App Performance & Reliability, Creative Tools
San Jose, CA $149,040.00-$311,600.00 2 days ago Senior Growth and Performance Marketing Manager
Mountain View, CA $162,000.00-$198,000.00 1 week ago We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
#J-18808-Ljbffr