DaVita Kidney Care
Overview
The MLOps Engineer (GCP Specialization) is responsible for designing, implementing, and maintaining infrastructure and processes on Google Cloud Platform (GCP) to enable the seamless development, deployment, and monitoring of machine learning models at scale. This role bridges data science and data engineering, Infrastructure, ensuring that machine learning systems are reliable, scalable, and optimized for GCP environments. Responsibilities
Model Deployment: Design and implement pipelines for deploying machine learning models into production using GCP services such as AI Platform, Vertex AI, Cloud Run, Cloud Composer to ensure high availability and performance. Infrastructure Management: Build and maintain scalable GCP-based infrastructure using services like Google Compute Engine, Google Kubernetes Engine (GKE), and Cloud Storage to support model training, deployment, and inference. Automation: Develop automated workflows for data ingestion, model training, validation, and deployment using GCP tools like Cloud Composer and CI/CD pipelines integrated with GitLab and Bitbucket repositories. Monitoring and Maintenance: Implement monitoring solutions using Google Cloud Monitoring and Logging to track model performance, data drift, and system health, and take corrective actions as needed. Collaboration: Work closely with data scientists, data engineers, Infrastructure and DevOps teams to streamline the ML lifecycle and ensure alignment with business objectives. Versioning and Reproducibility: Manage versioning of datasets, models, and code using GCP tools like Artifact Registry or Cloud Storage to ensure reproducibility and traceability of machine learning experiments. Optimization: Optimize model performance and resource utilization on GCP, leveraging containerization with Docker and GKE, and utilizing cost-efficient resources like preemptible VMs or Cloud TPU/GPU. Security and Compliance: Ensure ML systems comply with data privacy regulations (e.g., GDPR, CCPA) using GCP’s security tools like Cloud IAM, VPC Service Controls, and Data Loss Prevention (DLP). Tooling: Integrate GCP-native tools (e.g., Vertex AI, Cloud Composer) and open-source MLOps frameworks (e.g., MLflow, Kubeflow) to support the ML lifecycle. Qualifications
Technical Skills: Proficiency in programming languages such as Python. Expertise in GCP services, including Vertex AI, Google Kubernetes Engine (GKE), Cloud Run, BigQuery, Cloud Storage, and Cloud Composer, DataProc or PySpark and managed Airflow. Experience with infrastructure-as-code - Terraform. Familiarity with containerization (Docker, GKE) and CI/CD pipelines, GitLab and Bitbucket. Knowledge of ML frameworks (TensorFlow, PyTorch, scikit-learn) and MLOps tools compatible with GCP (MLflow, Kubeflow) and Gen AI RAG applications. Understanding of data engineering concepts, including ETL pipelines with BigQuery and Dataflow, Dataproc - PySpark. Soft Skills: Strong problem-solving and analytical skills. Excellent communication and collaboration abilities. Ability to work in a fast-paced, cross-functional environment. Preferred Qualifications
Experience with large-scale distributed ML systems on GCP, such as Vertex AI Pipelines or Kubeflow on GKE, Feature Store. Exposure to Generative AI (GenAI) and Retrieval-Augmented Generation (RAG) applications and deployment strategies. Familiarity with GCP’s model monitoring tools and techniques for detecting data drift or model degradation. Knowledge of microservices architecture and API development using Cloud Endpoints or Cloud Functions. Google Cloud Professional certifications (e.g., Professional Machine Learning Engineer, Professional Cloud Architect). What We’ll Provide
Our rewards package includes comprehensive benefits and professional development opportunities. Teammates are eligible to begin receiving benefits on the first day of the month following or coinciding with one month of continuous employment. Comprehensive benefits: Medical, dental, vision, 401(k) match, paid time off, PTO cash out Family resources, EAP counseling, Headspace access, backup child and elder care, maternity/paternity leave and more Professional development programs and on-demand virtual leadership and development courses through StarLearning We strive to be an equal opportunity workplace and comply with state and federal affirmative action requirements. Individuals are recruited, hired, assigned and promoted without regard to race, national origin, religion, age, color, sex, sexual orientation, gender identity, disability, protected veteran status, or any other protected characteristic. This position will be open for a minimum of three days. Salary Range: $68,400.00 - $100,400.00 per year. Colorado Residents: Please do not respond to questions seeking age-identifying information in this initial application. Seniority level
Entry level Employment type
Full-time Job function
Engineering and Information Technology Industries
Hospitals and Health Care
#J-18808-Ljbffr
The MLOps Engineer (GCP Specialization) is responsible for designing, implementing, and maintaining infrastructure and processes on Google Cloud Platform (GCP) to enable the seamless development, deployment, and monitoring of machine learning models at scale. This role bridges data science and data engineering, Infrastructure, ensuring that machine learning systems are reliable, scalable, and optimized for GCP environments. Responsibilities
Model Deployment: Design and implement pipelines for deploying machine learning models into production using GCP services such as AI Platform, Vertex AI, Cloud Run, Cloud Composer to ensure high availability and performance. Infrastructure Management: Build and maintain scalable GCP-based infrastructure using services like Google Compute Engine, Google Kubernetes Engine (GKE), and Cloud Storage to support model training, deployment, and inference. Automation: Develop automated workflows for data ingestion, model training, validation, and deployment using GCP tools like Cloud Composer and CI/CD pipelines integrated with GitLab and Bitbucket repositories. Monitoring and Maintenance: Implement monitoring solutions using Google Cloud Monitoring and Logging to track model performance, data drift, and system health, and take corrective actions as needed. Collaboration: Work closely with data scientists, data engineers, Infrastructure and DevOps teams to streamline the ML lifecycle and ensure alignment with business objectives. Versioning and Reproducibility: Manage versioning of datasets, models, and code using GCP tools like Artifact Registry or Cloud Storage to ensure reproducibility and traceability of machine learning experiments. Optimization: Optimize model performance and resource utilization on GCP, leveraging containerization with Docker and GKE, and utilizing cost-efficient resources like preemptible VMs or Cloud TPU/GPU. Security and Compliance: Ensure ML systems comply with data privacy regulations (e.g., GDPR, CCPA) using GCP’s security tools like Cloud IAM, VPC Service Controls, and Data Loss Prevention (DLP). Tooling: Integrate GCP-native tools (e.g., Vertex AI, Cloud Composer) and open-source MLOps frameworks (e.g., MLflow, Kubeflow) to support the ML lifecycle. Qualifications
Technical Skills: Proficiency in programming languages such as Python. Expertise in GCP services, including Vertex AI, Google Kubernetes Engine (GKE), Cloud Run, BigQuery, Cloud Storage, and Cloud Composer, DataProc or PySpark and managed Airflow. Experience with infrastructure-as-code - Terraform. Familiarity with containerization (Docker, GKE) and CI/CD pipelines, GitLab and Bitbucket. Knowledge of ML frameworks (TensorFlow, PyTorch, scikit-learn) and MLOps tools compatible with GCP (MLflow, Kubeflow) and Gen AI RAG applications. Understanding of data engineering concepts, including ETL pipelines with BigQuery and Dataflow, Dataproc - PySpark. Soft Skills: Strong problem-solving and analytical skills. Excellent communication and collaboration abilities. Ability to work in a fast-paced, cross-functional environment. Preferred Qualifications
Experience with large-scale distributed ML systems on GCP, such as Vertex AI Pipelines or Kubeflow on GKE, Feature Store. Exposure to Generative AI (GenAI) and Retrieval-Augmented Generation (RAG) applications and deployment strategies. Familiarity with GCP’s model monitoring tools and techniques for detecting data drift or model degradation. Knowledge of microservices architecture and API development using Cloud Endpoints or Cloud Functions. Google Cloud Professional certifications (e.g., Professional Machine Learning Engineer, Professional Cloud Architect). What We’ll Provide
Our rewards package includes comprehensive benefits and professional development opportunities. Teammates are eligible to begin receiving benefits on the first day of the month following or coinciding with one month of continuous employment. Comprehensive benefits: Medical, dental, vision, 401(k) match, paid time off, PTO cash out Family resources, EAP counseling, Headspace access, backup child and elder care, maternity/paternity leave and more Professional development programs and on-demand virtual leadership and development courses through StarLearning We strive to be an equal opportunity workplace and comply with state and federal affirmative action requirements. Individuals are recruited, hired, assigned and promoted without regard to race, national origin, religion, age, color, sex, sexual orientation, gender identity, disability, protected veteran status, or any other protected characteristic. This position will be open for a minimum of three days. Salary Range: $68,400.00 - $100,400.00 per year. Colorado Residents: Please do not respond to questions seeking age-identifying information in this initial application. Seniority level
Entry level Employment type
Full-time Job function
Engineering and Information Technology Industries
Hospitals and Health Care
#J-18808-Ljbffr