INFOLOB
Senior Databricks AI Platform SRE
Location:
Atlanta, GA
Duration:
Long Term
Rate:
$ Open/Hour
We are looking for a Senior Databricks AI Platform SRE to join our Platform SRE team. This role will be critical in designing, building, and optimizing a scalable, secure, and developer-friendly Databricks platform to enable Machine Learning (ML) and Artificial Intelligence (AI) workloads at enterprise scale.
You will partner with ML engineer, data scientists, platform teams, and cloud architects to automate infrastructure, enforce best practices, and streamline the end-to-end ML lifecycle using modern cloud-native technologies.
Total Experience – 5+ Years. Bachelor’s or master’s degree in computer science, Engineering or a related field.
Responsibilities
Design and implement secure, scalable, and automated Databricks environments to support AI/ML workloads.
Develop infrastructure-as-code (IaC) solutions using Terraform for provisioning Databricks, cloud resources, and network configurations.
Build automation and self-service capabilities using Python, Java and APIs for platform onboarding, workspace provisioning, orchestration and monitoring.
Collaborate with data science and ML teams to define compute requirements, governance policies, and efficient workflows across dev/qa/prod environments.
Integrate Databricks offering with cloud-native services on Azure/AWS.
Champion CI/CD and GitOps for managing ML infrastructure and configurations.
Ensure compliance with enterprise security and data governance policies using RBAC, Audit Controls, Encryption, Network Isolation, and policies.
Monitor platform performance, reliability, and usage, and drive improvements to optimize cost and resource utilizations.
Required Skills
Proven experience with Terraform for building and managing infrastructure.
Strong programming skills in Python and Java.
Hands‑on experience with cloud networking, identity and access management, key vaults, monitoring and logging in Azure.
Hands‑on experience with Databricks (Workspace management, Clusters, Jobs, MLFlow, Delta Lake, Unity Catalog, Mosaic AI).
Deep understanding of Azure or AWS infrastructure (e.g. IAM, VNets/VPC, Storage, Networks, Compute, Key management, monitoring).
Strong experience in distributed system design, development and deployment using agile/devops practices.
Experience with CI/CD pipelines (GitHub Actions, or similar).
Experience implementing monitoring and observability using Prometheus, Grafana or Databricks-native solutions.
Good communication skills, excellent teamwork experience, ability to mentor and develop more junior developers, including participating in constructive code reviews.
Preferred Skills
Experience in multi‑cloud environments (AWS/GCP) is a bonus.
Experience in working in highly regulated environments (finance, healthcare, etc.) is desirable.
Experience with Databricks REST APIs and SDKs.
Knowledge of MLFlow, Mosaic AC, & MLOps tooling.
Working with teams using Scrum, Kanban or other agile practices.
Proficiency with standard Linux command line and debugging tools.
Azure or AWS Certifications.
Seniority level Mid‑Senior level
Employment type Contract
Job function Information Technology
Contact Please send your resume in word format, following details to anand.yalla@infolob.com or call me @ 972-845-7069 for more information:
Name in Full:
Email ID:
Current Location:
Relocation:
Availability:
Work Authorization:
LinkedIn Profile:
DOB (Month and Day):
Skype ID:
#J-18808-Ljbffr
Atlanta, GA
Duration:
Long Term
Rate:
$ Open/Hour
We are looking for a Senior Databricks AI Platform SRE to join our Platform SRE team. This role will be critical in designing, building, and optimizing a scalable, secure, and developer-friendly Databricks platform to enable Machine Learning (ML) and Artificial Intelligence (AI) workloads at enterprise scale.
You will partner with ML engineer, data scientists, platform teams, and cloud architects to automate infrastructure, enforce best practices, and streamline the end-to-end ML lifecycle using modern cloud-native technologies.
Total Experience – 5+ Years. Bachelor’s or master’s degree in computer science, Engineering or a related field.
Responsibilities
Design and implement secure, scalable, and automated Databricks environments to support AI/ML workloads.
Develop infrastructure-as-code (IaC) solutions using Terraform for provisioning Databricks, cloud resources, and network configurations.
Build automation and self-service capabilities using Python, Java and APIs for platform onboarding, workspace provisioning, orchestration and monitoring.
Collaborate with data science and ML teams to define compute requirements, governance policies, and efficient workflows across dev/qa/prod environments.
Integrate Databricks offering with cloud-native services on Azure/AWS.
Champion CI/CD and GitOps for managing ML infrastructure and configurations.
Ensure compliance with enterprise security and data governance policies using RBAC, Audit Controls, Encryption, Network Isolation, and policies.
Monitor platform performance, reliability, and usage, and drive improvements to optimize cost and resource utilizations.
Required Skills
Proven experience with Terraform for building and managing infrastructure.
Strong programming skills in Python and Java.
Hands‑on experience with cloud networking, identity and access management, key vaults, monitoring and logging in Azure.
Hands‑on experience with Databricks (Workspace management, Clusters, Jobs, MLFlow, Delta Lake, Unity Catalog, Mosaic AI).
Deep understanding of Azure or AWS infrastructure (e.g. IAM, VNets/VPC, Storage, Networks, Compute, Key management, monitoring).
Strong experience in distributed system design, development and deployment using agile/devops practices.
Experience with CI/CD pipelines (GitHub Actions, or similar).
Experience implementing monitoring and observability using Prometheus, Grafana or Databricks-native solutions.
Good communication skills, excellent teamwork experience, ability to mentor and develop more junior developers, including participating in constructive code reviews.
Preferred Skills
Experience in multi‑cloud environments (AWS/GCP) is a bonus.
Experience in working in highly regulated environments (finance, healthcare, etc.) is desirable.
Experience with Databricks REST APIs and SDKs.
Knowledge of MLFlow, Mosaic AC, & MLOps tooling.
Working with teams using Scrum, Kanban or other agile practices.
Proficiency with standard Linux command line and debugging tools.
Azure or AWS Certifications.
Seniority level Mid‑Senior level
Employment type Contract
Job function Information Technology
Contact Please send your resume in word format, following details to anand.yalla@infolob.com or call me @ 972-845-7069 for more information:
Name in Full:
Email ID:
Current Location:
Relocation:
Availability:
Work Authorization:
LinkedIn Profile:
DOB (Month and Day):
Skype ID:
#J-18808-Ljbffr