ECS Limited

AI Integration Engineer

ECS Limited, Arlington, Virginia, United States, 22201

ECS is seeking an

AI Integration Engineer

to work in our

Arlington, VA

office.

We are seeking a highly skilled AI Integration Engineer to lead seamless deployment, monitoring, and optimization of artificial intelligence and machine learning models in production environments. The AI Integration Engineer will design, implement, and maintain end-to-end machine learning pipelines, automating deployment and monitoring processes while ensuring performance, observability, and security. This role focuses on building scalable infrastructure, real-time dashboards, and automated pipelines that support secure, compliant, and efficient AI operations aligned with mission and business objectives.

Responsibilities:

Deploy and manage AI/ML models in production using frameworks such as MLflow, Kubeflow, or AWS SageMaker, ensuring scalability, low latency, and fault tolerance. Develop and maintain dashboards using Grafana, Prometheus, or Kibana to provide real-time and historical visibility into model health, including accuracy, latency, and performance metrics. Implement and maintain drift detection pipelines with tools like Evidently AI or Alibi Detect to identify data distribution shifts and trigger model retraining or alerts. Configure centralized logging systems with ELK Stack or OpenTelemetry to capture inference events, anomalies, and audit trails for debugging, observability, and compliance. Design and manage CI/CD pipelines using GitHub Actions or Jenkins to automate model updates, testing, and deployment workflows. Apply secure-by-design principles to protect AI pipelines and data flows through encryption, access control, and adherence to regulations such as GDPR, HIPAA, and NIST AI RMF. Work closely with data scientists, AI engineers, and DevOps teams to align model design, deployment performance, and infrastructure optimization. Enhance model efficiency through quantization, pruning, and performance tuning to maximize resource utilization across hybrid and cloud platforms (AWS, Azure, Google Cloud). Develop and maintain detailed documentation of deployment pipelines, dashboards, and monitoring procedures to ensure cross-team transparency and continuity. Bachelor's or Master's degree in Computer Science, Data Science, Engineering, or a related technical field. Minimum 5+ years of experience in MLOps, DevOps, or software engineering with a focus on AI/ML systems. Proven experience deploying models in production environments using MLflow, Kubeflow, or cloud AI platforms (AWS SageMaker, Azure ML, or Google Cloud Vertex AI). Hands-on experience with observability and monitoring tools such as Prometheus, Grafana, or Datadog. Proficiency in Python and SQL; familiarity with JavaScript or Go is advantageous. Expertise in containerization and orchestration (Docker, Kubernetes) and CI/CD automation (GitHub Actions, Jenkins). Experience with time-series databases (InfluxDB, TimescaleDB) and logging frameworks (ELK Stack, OpenTelemetry). Familiarity with drift detection tools (Evidently AI, Alibi Detect) and data visualization libraries (Plotly, Seaborn). Strong understanding of model evaluation metrics (e.g., precision, recall, AUC) and statistical drift detection methods (e.g., KS test, PSI). Awareness of AI security threats (e.g., data poisoning, adversarial attacks) and mitigation using frameworks such as the Adversarial Robustness Toolbox (ART). Proven problem-solving and debugging skills for resolving pipeline or deployment issues. Excellent collaboration and communication skills with cross-functional teams and stakeholders. High attention to detail for ensuring accuracy, traceability, and compliance in dashboard reporting and pipeline documentation. Must be a U.S. Citizen and eligible to obtain a Department of Homeland Security (DHS) EOD clearance (requires a favorable background investigation).