Logo
Keck Medicine of USC

Machine Learning Ops Engineer

Keck Medicine of USC, Los Angeles, California, United States, 90079

Save Job

Summary: Under the direction of Information Services Leadership, the incumbent will be responsible for the full lifecycle management of machine learning models, including design, build, and maintenance of machine learning models. The MLOps Engineer will play an integral role in implementing artificial intelligence solutions across Keck Medicine of USC. The incumbent will partner with data scientists, data team members, and clinical operations to deploy, monitor, and maintain machine learning solutions that will improve patient care, support operational excellence, and advance clinical research. The incumbent will ensure seamless integration, automation, and scaling of AI solutions within the existing infrastructure by leveraging DevOps expertise. They will maintain and continuously improve MLOps pipelines for monitoring, versioning, and deploying models in production environments. The incumbent will be responsible for the end-to-end lifecycle management of artificial intelligence solutions and comes with DevOps experience, ensuring seamless integration, deployment, and automation of systems. The MLOps Engineer will implement best practices for testing, debugging, and performance monitoring of AI systems to ensure reliability and scalability.

Minimum Education: Bachelor’s degree computer science, artificial intelligence, informatics or closely related field. Master’s degree in computer science, engineering or closely related field preferred.

Minimum Experience: 3 or more years relevant Machine Learning Engineer Experience. Proven experience with: Artificial intelligence and machine learning platforms (e.g., AWS, Azure or GCP). Containerization technologies (e.g., Docker) or container orchestration platforms (e.g., Kubernetes). CI/CD tools (e.g., Github Actions). Programming languages and frameworks (e.g., Python, R, SQL). MLOps engineering principles, agile methodologies, and DevOps life-cycle management. Technical writing and documentation for AI/ML models and processes. Healthcare data and machine learning use cases. Healthcare Expertise: Understanding of healthcare regulations and standards, and familiarity with Electronic Health Records (EHR) systems, including integrating machine learning models with these systems. Experience in managing end-to-end ML lifecycle. Deep understanding of coding, architecture, and deployment processes.

Accountabilities: Production Deployment and Model Engineering: Proven experience in deploying and maintaining production-grade machine learning models, with real-time inference, scalability, and reliability. Scalable ML Infrastructures: Proficiency in developing end-to-end scalable ML infrastructures using on-premise cloud platforms such as Amazon Web Services (AWS), Google Cloud Platform (GCP), or Azure. Engineering Leadership: Ability to lead engineering efforts in creating and implementing methods and workflows for ML/GenAI model engineering, LLM advancements, and optimizing deployment frameworks while aligning with business strategic directions. AI Pipeline Development: Experience in developing AI pipelines for various data processing needs, including data ingestion, preprocessing, and search and retrieval, ensuring solutions meet all technical and business requirements. Collaboration: Demonstrated ability to collaborate with data scientists, data engineers, analytics teams, and DevOps teams to design and implement robust deployment pipelines for continuous improvement of machine learning models. Continuous Integration/Continuous Deployment (CI/CD) Pipelines: Expertise in implementing and optimizing CI/CD pipelines for machine learning models, automating testing and deployment processes. Monitoring and Logging: Competence in setting up monitoring and logging solutions to track model performance, system health, and anomalies, allowing for timely intervention and proactive maintenance. Version Control: Experience implementing version control systems for machine learning models and associated code to track changes and facilitate collaboration. Security and Compliance: Knowledge of ensuring machine learning systems meet security and compliance standards, including data protection and privacy regulations. Documentation: Skill in maintaining clear and comprehensive documentation of ML Ops processes and configurations.