ShiftCode Analytics

MLOps Engineer - AWS & Databricks - SGWS

ShiftCode Analytics, Dallas, Texas, United States, 75215

Interview: Virtual (Need Strong candidate with LinkedIn) Visa: USC and GC Hybrid: Dallax, TX and Miramar, FL (Local or nearby)

JD: Overview:

We're seeking a highly skilled MLOps Engineer with deep expertise in AWS and Databricks to design, implement, and maintain scalable machine learning infrastructure. This role is ideal for someone who thrives in fast-paced environments and is passionate about automating and optimizing the ML lifecycle-from model development to deployment and monitoring. Primary Responsibilities:

Design, implement, and maintain CI/CD pipelines for ML applications using AWS CodePipeline, CodeCommit, and CodeBuild. Automate deployment of ML models into production using Amazon SageMaker, Databricks, and MLflow for versioning and lifecycle management. Develop, test, and deploy AWS Lambda functions for triggering workflows, automating pre/post-processing, and integrating with other AWS services. Maintain and monitor Databricks model serving endpoints for scalable, low-latency inference. Orchestrate complex ML pipelines using Airflow (MWAA) or Databricks Workflows, covering ingestion, training, evaluation, and deployment. Collaborate with Data Scientists and ML Engineers to convert notebooks into reproducible, version-controlled pipelines. Integrate model monitoring and alerting (drift detection, performance logging) using CloudWatch, Prometheus, or Datadog. Manage infrastructure-as-code (IaC) via CloudFormation or Terraform for secure, reproducible deployments. Ensure secure and compliant pipelines using IAM roles, VPC configurations, and secrets management (AWS Secrets Manager or SSM Parameter Store). Champion DevOps best practices across the ML lifecycle, including canary deployments, rollback strategies, and audit logging. Minimum Requirements:

4+ years of hands-on MLOps experience deploying ML applications at scale. Proficient in AWS services: SageMaker, Lambda, CodePipeline, CodeCommit, ECR, ECS/Fargate, and CloudWatch. Strong experience with Databricks workflows and Model Serving, including MLflow for tracking and deployment. Proficient in Python and shell scripting; skilled in Docker containerization. Deep understanding of CI/CD principles for ML, including pipeline testing, data validation, and quality gates. Experience orchestrating ML workflows using Airflow (open-source or MWAA) or Databricks Workflows. Familiarity with monitoring/logging stacks: Prometheus, ELK, Datadog, or OpenTelemetry. Experience deploying models as REST endpoints, batch jobs, and asynchronous workflows. Strong Git/GitHub skills with experience in automated deployment reviews and rollback strategies. Nice to Have:

Experience with Feature Stores (e.g., SageMaker Feature Store, Feast). Familiarity with Kubeflow, SageMaker Pipelines, or Vertex AI. Exposure to LLM-based models, vector databases, or RAG pipelines. Knowledge of Terraform or AWS CDK for infrastructure automation. Experience with A/B testing or shadow deployments for ML models.