Logo
Hedra

Machine Learning Engineer, Training Infrastructure

Hedra, San Francisco, California, United States, 94199

Save Job

Machine Learning Engineer, Training Infrastructure

Join to apply for the

Machine Learning Engineer, Training Infrastructure

role at

Hedra Overview

We are looking for an ML Engineer with 3+ years of experience in high-performance computing systems to manage and optimize our computational infrastructure for training and deploying our machine learning models. The ideal candidate has diverse experience managing ML workloads at scale, supporting our 3DVAE and video diffusion models. We encourage you to apply even if you don’t meet every requirement — curiosity, creativity, and the drive to solve hard problems are valued. Responsibilities

Design, implement, and maintain scalable computing solutions for training and deploying ML models, ensuring infrastructure can handle large video datasets. Manage and optimize the performance of computing clusters or cloud instances (e.g., AWS, Google Cloud) to support distributed training. Ensure infrastructure can handle resource-intensive tasks associated with training large generative models. Monitor system performance and implement improvements to maximize efficiency and utilization, using tools like Airflow for orchestration. Collaborate with research teams to understand computational needs and provide appropriate solutions, facilitating seamless model deployment. Qualifications

Bachelor’s degree in Computer Science, Information Technology, or a related field, with a focus on system administration. Experience with cloud platforms such as Amazon Web Services, Google Cloud, or Microsoft Azure. Experience with version control and CI/CD processes. Knowledge of containerization technologies like Docker and Kubernetes for deployments at scale. Understanding of distributed training techniques and scaling models across multi-node clusters, aligned with video generation needs. Strong problem-solving and communication skills for collaboration with diverse teams. Benefits

Competitive compensation + equity 401k (no match) Healthcare (Silver PPO Medical, Vision, Dental) Lunch and snacks at the office Additional

Seniority level: Mid-Senior level Employment type: Full-time Job function: Engineering and Information Technology Industries: Technology, Information and Internet

#J-18808-Ljbffr