SAIC
Description
We are seeking a versatile
SRE/MLOps Engineer with DevSecOps expertise
to design, automate, and operate secure, scalable, and repeatable
model deployment workflows
across the AI/ML Common Services environment. This role bridges
infrastructure reliability, CI/CD automation, and model operations , enabling IRS mission teams to move from experimentation to production with confidence. The engineer will not only support
ML lifecycle operations
(Databricks, MLflow, AWS SageMaker/Bedrock) but also bring
DevSecOps rigor
to ensure compliance, monitoring, and infrastructure-as-code are embedded in every step. By partnering with Infrastructure, Security, and Architecture teams, this role ensures the AAP environment is
resilient, automated, and compliance-ready
at enterprise scale. Key Responsibilities
Enable
secure, scalable, and repeatable
deployment workflows for both ML models and supporting infrastructure. Build and maintain
runtime environments, service accounts, orchestration logic
for Databricks, MLflow, and AWS AI services. Implement and maintain
CI/CD pipelines
(Bitbucket, Bamboo, Jenkins, or equivalent) for code, data, and model deployments. Apply
DevSecOps practices
— integrating security scans, compliance checks, and audit logging into deployment pipelines. Collaborate with
Infrastructure DSO
and
Solutions Architect
to integrate Terraform-based IaC for consistent, automated provisioning. Implement
observability, alerting, and logging
(CloudWatch, Datadog, Prometheus) to monitor both application and ML workloads. Align infrastructure with ML lifecycle needs — including staging, promotion, rollback, retraining, and compliance-aware tracking. Develop
automation templates, reusable workflows, and guardrails
to accelerate onboarding of mission team models while ensuring security. Contribute to
incident response, performance tuning, and reliability engineering
across ML and non-ML workloads. Qualifications
Required Qualifications
Bachelor’s or master’s degree in computer science, Data Engineering, or a related technical discipline. 5+ years of experience in
Site Reliability Engineering, DevOps, or MLOps
with production-grade systems. Must be a U.S. Citizen with the ability to obtain and maintain a Public Trust security clearance. Hands-on experience with
Databricks, MLflow, or AWS SageMaker/Bedrock
for ML model lifecycle operations. Strong proficiency in
Terraform, CI/CD pipelines , and container orchestration (Docker, Kubernetes). Experience implementing
security automation
(e.g., IaC scanning, container security, SAST/DAST tools) within CI/CD workflows. Solid understanding of
observability stacks
(logs, metrics, tracing) and best operational practices. Desired Skills
Active IRS clearance highly desired. Experience in
federal or regulated environments
with security, audit, and compliance requirements (FedRAMP, NIST 800-53). Knowledge of
Trustworthy AI monitoring
(bias detection, drift monitoring, explainability). Familiarity with
Unity Catalog, Delta Lake, and data pipeline orchestration
in Databricks. Hands-on experience with
Zero Trust security models
and secure boundary implementations. Relevant certifications such as:
Databricks Certified Machine Learning Professional. AWS DevOps Engineer – Professional. Certified Kubernetes Administrator (CKA). Security+ or equivalent security cert.
Target salary range: $120,001 - $160,000. The estimate displayed represents the typical salary range for this position based on experience and other factors.
#J-18808-Ljbffr
We are seeking a versatile
SRE/MLOps Engineer with DevSecOps expertise
to design, automate, and operate secure, scalable, and repeatable
model deployment workflows
across the AI/ML Common Services environment. This role bridges
infrastructure reliability, CI/CD automation, and model operations , enabling IRS mission teams to move from experimentation to production with confidence. The engineer will not only support
ML lifecycle operations
(Databricks, MLflow, AWS SageMaker/Bedrock) but also bring
DevSecOps rigor
to ensure compliance, monitoring, and infrastructure-as-code are embedded in every step. By partnering with Infrastructure, Security, and Architecture teams, this role ensures the AAP environment is
resilient, automated, and compliance-ready
at enterprise scale. Key Responsibilities
Enable
secure, scalable, and repeatable
deployment workflows for both ML models and supporting infrastructure. Build and maintain
runtime environments, service accounts, orchestration logic
for Databricks, MLflow, and AWS AI services. Implement and maintain
CI/CD pipelines
(Bitbucket, Bamboo, Jenkins, or equivalent) for code, data, and model deployments. Apply
DevSecOps practices
— integrating security scans, compliance checks, and audit logging into deployment pipelines. Collaborate with
Infrastructure DSO
and
Solutions Architect
to integrate Terraform-based IaC for consistent, automated provisioning. Implement
observability, alerting, and logging
(CloudWatch, Datadog, Prometheus) to monitor both application and ML workloads. Align infrastructure with ML lifecycle needs — including staging, promotion, rollback, retraining, and compliance-aware tracking. Develop
automation templates, reusable workflows, and guardrails
to accelerate onboarding of mission team models while ensuring security. Contribute to
incident response, performance tuning, and reliability engineering
across ML and non-ML workloads. Qualifications
Required Qualifications
Bachelor’s or master’s degree in computer science, Data Engineering, or a related technical discipline. 5+ years of experience in
Site Reliability Engineering, DevOps, or MLOps
with production-grade systems. Must be a U.S. Citizen with the ability to obtain and maintain a Public Trust security clearance. Hands-on experience with
Databricks, MLflow, or AWS SageMaker/Bedrock
for ML model lifecycle operations. Strong proficiency in
Terraform, CI/CD pipelines , and container orchestration (Docker, Kubernetes). Experience implementing
security automation
(e.g., IaC scanning, container security, SAST/DAST tools) within CI/CD workflows. Solid understanding of
observability stacks
(logs, metrics, tracing) and best operational practices. Desired Skills
Active IRS clearance highly desired. Experience in
federal or regulated environments
with security, audit, and compliance requirements (FedRAMP, NIST 800-53). Knowledge of
Trustworthy AI monitoring
(bias detection, drift monitoring, explainability). Familiarity with
Unity Catalog, Delta Lake, and data pipeline orchestration
in Databricks. Hands-on experience with
Zero Trust security models
and secure boundary implementations. Relevant certifications such as:
Databricks Certified Machine Learning Professional. AWS DevOps Engineer – Professional. Certified Kubernetes Administrator (CKA). Security+ or equivalent security cert.
Target salary range: $120,001 - $160,000. The estimate displayed represents the typical salary range for this position based on experience and other factors.
#J-18808-Ljbffr