Sev1tech, Inc.
Overview
We are seeking a skilled MLOps Engineer to join our team and ensure the seamless deployment, monitoring, and optimization of AI models in production.
The MLOps Engineer will design, implement, and maintain end-to-end machine learning pipelines, focusing on automating model deployment, monitoring model health, detecting data drift, and managing AI-related logging. This role will involve building scalable infrastructure and dashboards for real-time and historical insights, ensuring models are secure, performant, and aligned with business needs.
Key Responsibilities
Model Deployment : Deploy and manage machine learning models in production using tools like MLflow, Kubeflow, or AWS SageMaker, ensuring scalability and low latency.
Monitoring and Observability : Build and maintain dashboards using Grafana, Prometheus, or Kibana to track real-time model health (e.g., accuracy, latency) and historical trends.
Data Drift Detection : Implement drift detection pipelines using tools like Evidently AI or Alibi Detect to identify shifts in data distributions and trigger alerts or retraining.
Logging and Tracing : Set up centralized logging with ELK Stack or OpenTelemetry to capture AI inference events, errors, and audit trails for debugging and compliance.
Pipeline Automation : Develop CI/CD pipelines with GitHub Actions or Jenkins to automate model updates, testing, and deployment.
Security and Compliance : Apply secure-by-design principles to protect data pipelines and models, using encryption, access controls, and compliance with regulations like GDPR or NIST AI RMF.
Collaboration : Work with data scientists, AI Integration Engineers, and DevOps teams to align model performance with business requirements and infrastructure capabilities.
Optimization : Optimize models for production (e.g., via quantization or pruning) and ensure efficient resource usage on cloud platforms like AWS, Azure, or Google Cloud.
Documentation : Maintain clear documentation of pipelines, dashboards, and monitoring processes for cross-team transparency.
Qualifications
Education : Bachelor’s or Master’s degree in Computer Science, Data Science, Engineering, or a related field.
Experience :
5+ years in MLOps, DevOps, or software engineering with a focus on AI/ML systems.
Proven experience deploying models in production using MLflow, Kubeflow, or cloud platforms (AWS SageMaker, Azure ML).
Hands-on experience with observability tools like Prometheus, Grafana, or Datadog for real-time monitoring.
Technical Skills :
Proficiency in Python and SQL; familiarity with JavaScript or Go is a plus.
Expertise in containerization (Docker, Kubernetes) and CI/CD tools (GitHub Actions, Jenkins).
Knowledge of time-series databases (e.g., InfluxDB, TimescaleDB) and logging frameworks (e.g., ELK Stack, OpenTelemetry).
Experience with drift detection tools (e.g., Evidently AI, Alibi Detect) and visualization libraries (e.g., Plotly, Seaborn).
AI-Specific Skills :
Understanding of model performance metrics (e.g., precision, recall, AUC) and drift detection methods (e.g., KS test, PSI).
Familiarity with AI vulnerabilities (e.g., data poisoning, adversarial attacks) and mitigation tools like Adversarial Robustness Toolbox (ART).
Soft Skills :
Strong problem-solving and debugging skills for resolving pipeline and monitoring issues.
Excellent collaboration and communication skills to work with cross-functional teams.
Attention to detail for ensuring accurate and secure dashboard reporting.
Preferred Qualifications
Experience with LLM monitoring tools like LangSmith or Helicone for generative AI applications.
Knowledge of compliance frameworks (e.g., GDPR, HIPAA) for secure data handling.
Contributions to open-source MLOps projects or familiarity with X platform discussions on #MLOps or #AIOps.
Equal employment opportunity, including veterans and individuals with disabilities.
#J-18808-Ljbffr
The MLOps Engineer will design, implement, and maintain end-to-end machine learning pipelines, focusing on automating model deployment, monitoring model health, detecting data drift, and managing AI-related logging. This role will involve building scalable infrastructure and dashboards for real-time and historical insights, ensuring models are secure, performant, and aligned with business needs.
Key Responsibilities
Model Deployment : Deploy and manage machine learning models in production using tools like MLflow, Kubeflow, or AWS SageMaker, ensuring scalability and low latency.
Monitoring and Observability : Build and maintain dashboards using Grafana, Prometheus, or Kibana to track real-time model health (e.g., accuracy, latency) and historical trends.
Data Drift Detection : Implement drift detection pipelines using tools like Evidently AI or Alibi Detect to identify shifts in data distributions and trigger alerts or retraining.
Logging and Tracing : Set up centralized logging with ELK Stack or OpenTelemetry to capture AI inference events, errors, and audit trails for debugging and compliance.
Pipeline Automation : Develop CI/CD pipelines with GitHub Actions or Jenkins to automate model updates, testing, and deployment.
Security and Compliance : Apply secure-by-design principles to protect data pipelines and models, using encryption, access controls, and compliance with regulations like GDPR or NIST AI RMF.
Collaboration : Work with data scientists, AI Integration Engineers, and DevOps teams to align model performance with business requirements and infrastructure capabilities.
Optimization : Optimize models for production (e.g., via quantization or pruning) and ensure efficient resource usage on cloud platforms like AWS, Azure, or Google Cloud.
Documentation : Maintain clear documentation of pipelines, dashboards, and monitoring processes for cross-team transparency.
Qualifications
Education : Bachelor’s or Master’s degree in Computer Science, Data Science, Engineering, or a related field.
Experience :
5+ years in MLOps, DevOps, or software engineering with a focus on AI/ML systems.
Proven experience deploying models in production using MLflow, Kubeflow, or cloud platforms (AWS SageMaker, Azure ML).
Hands-on experience with observability tools like Prometheus, Grafana, or Datadog for real-time monitoring.
Technical Skills :
Proficiency in Python and SQL; familiarity with JavaScript or Go is a plus.
Expertise in containerization (Docker, Kubernetes) and CI/CD tools (GitHub Actions, Jenkins).
Knowledge of time-series databases (e.g., InfluxDB, TimescaleDB) and logging frameworks (e.g., ELK Stack, OpenTelemetry).
Experience with drift detection tools (e.g., Evidently AI, Alibi Detect) and visualization libraries (e.g., Plotly, Seaborn).
AI-Specific Skills :
Understanding of model performance metrics (e.g., precision, recall, AUC) and drift detection methods (e.g., KS test, PSI).
Familiarity with AI vulnerabilities (e.g., data poisoning, adversarial attacks) and mitigation tools like Adversarial Robustness Toolbox (ART).
Soft Skills :
Strong problem-solving and debugging skills for resolving pipeline and monitoring issues.
Excellent collaboration and communication skills to work with cross-functional teams.
Attention to detail for ensuring accurate and secure dashboard reporting.
Preferred Qualifications
Experience with LLM monitoring tools like LangSmith or Helicone for generative AI applications.
Knowledge of compliance frameworks (e.g., GDPR, HIPAA) for secure data handling.
Contributions to open-source MLOps projects or familiarity with X platform discussions on #MLOps or #AIOps.
Equal employment opportunity, including veterans and individuals with disabilities.
#J-18808-Ljbffr