ETQ
Principal Data Scientist (AI) – Remote (US)
Hexagon's ETQ division is seeking a hands-on Data Scientist to build predictive models, implement Generative AI and Agentic AI features, and architect data‑driven solutions for our document‑based compliance management platform. This role requires a technical expert who can develop, deploy, and maintain ML systems in production environments.
Responsibilities
- Build and deploy Generative AI features using foundation models (AWS Bedrock, OpenAI, Anthropic Claude) and RAG architectures with vector databases for compliance document understanding.
- Design agentic AI systems that autonomously handle compliance workflows, document review, regulatory mapping, and multi‑step reasoning tasks.
- Implement comprehensive LLM evaluation frameworks with automated pipelines, custom metrics, benchmark datasets, and safety guardrails ensuring regulatory compliance.
- Build end‑to‑end MLOps pipelines for model training, deployment, monitoring, versioning, and automated retraining with drift detection.
- Develop predictive models for compliance risk scoring, regulatory change impact, anomaly detection, and time‑series forecasting.
- Write production‑quality Python code for data processing, feature engineering, API development (FastAPI/Flask), and ETL/ELT workflows.
- Lead A/B experiments and product analytics to measure AI feature impact and drive data‑driven decision‑making.
- Create explainability frameworks (SHAP/LIME) and monitoring dashboards ensuring transparency and regulatory adherence.
- Collaborate with cross‑functional teams to translate business needs into ML solutions and communicate insights to stakeholders.
Qualifications
- 7+ years in data science, ML engineering, or related roles.
- 3+ years building NLP/Generative AI applications and implementing MLOps in production.
- Python (5+ years): Production‑level experience with Pandas, NumPy, scikit‑learn, XGBoost, TensorFlow/PyTorch, Hugging Face Transformers, FastAPI/Flask, MLflow, and pytest.
- SQL: Advanced proficiency with complex queries, window functions, and optimization.
- Strong foundation in supervised/unsupervised learning, deep learning, document understanding, text classification, and semantic analysis.
- Hands‑on experience with foundation models, prompt engineering, RAG architectures, and vector databases (Pinecone, Weaviate, Chroma).
- End‑to‑end experience with ML pipelines, experiment tracking (MLflow, W&B), model versioning, feature stores, drift detection, CI/CD for ML, and Docker containerization.
- Experience with evaluation frameworks (RAGAS, DeepEval), custom metrics, benchmark datasets, and human‑in‑the‑loop validation.
- Experience with AWS services including SageMaker, Bedrock, S3, Lambda, EC2, and CloudWatch.
- Strong foundation in statistics, A/B testing, causal inference, and experimental design.
- Proficiency with Tableau, Power BI, or Python visualization libraries.
- Preferred: Experience with agentic AI frameworks (LangGraph, LangChain, AutoGen, CrewAI). Knowledge of regulated industries (FDA, EMA, ISO, GxP) and compliance management systems.
- Familiarity with big data tools (Spark, Databricks, Snowflake), orchestration (Airflow, Kubeflow), and monitoring tools (Datadog, Prometheus).
- Understanding of ML governance, bias detection, model risk management, and data privacy regulations (GDPR, CCPA, HIPAA).
- Experience working in agile environments with Jira.
- AWS ML certifications or similar credentials (preferred).
Key Competencies
- Strong communication skills explaining complex models to technical and non‑technical audiences.
- Ability to work independently and collaboratively in fast‑paced environments.
- Proven ability to convert POCs into production‑grade solutions.
- Understanding of ethical AI and building trustworthy, explainable systems for regulated environments.