Ai Squared

Machine Learning Engineer

Ai Squared, Washington, District of Columbia, us, 20022

Machine Learning Engineer

Washington, DC (Hybrid)

About the Role:

We are seeking a highly skilled Machine Learning Engineer to join our core AI team. In this role, you will focus on deploying, maintaining, and monitoring the AI/ML systems that power our platform. You will work closely with data scientists, data engineers, and product teams to ensure scalable, reliable, and production-grade AI solutions. You'll play a critical role in operationalizing large language models (LLMs) and other ML systems, ensuring they run efficiently, securely, and with robust monitoring in place.

Key Responsibilities: Design, implement, and maintain ML deployment pipelines for scalable production systems. Operationalize large language models (LLMs) and other AI/ML models, ensuring high availability and reliability. Build robust model monitoring, logging, and alerting systems to track performance and detect drift. Partner with data scientists to transition models from research/prototype into production-ready deployments. Develop CI/CD pipelines for ML workflows, integrating testing, validation, and automated deployment. Optimize runtime performance of ML models across cloud platforms (AWS, GCP, Azure) and distributed systems. Apply containerization and orchestration (Docker, Kubernetes) to enable reproducible, scalable systems. Collaborate with cross-functional teams to ensure ML systems align with platform goals and business requirements. Qualifications:

5+ years of experience as a Machine Learning Engineer, MLOps Engineer, or similar role. Proven experience deploying and maintaining machine learning models in production at scale. Hands-on experience with ML lifecycle tooling (MLflow, Kubeflow, SageMaker, Vertex AI, or similar). Strong proficiency in Python; familiarity with ML frameworks such as PyTorch or TensorFlow. Deep knowledge of containerization (Docker) and orchestration (Kubernetes) for production ML systems. Expertise with cloud platforms (AWS, GCP, Azure) for ML deployment and scaling. Strong understanding of MLOps best practices, monitoring, and automation. Excellent problem-solving skills, with an emphasis on building reliable, scalable systems. Strong communication and collaboration skills across technical and non-technical teams.