ETQ

Principal Data Scientist (AI)- REMOTE (US)

ETQ, Houston, Texas, United States, 77246

Principal Data Scientist (AI) - Remote (US) This role is part of Hexagon’s ETQ division and focuses on building predictive models, implementing Generative AI and Agentic AI features, and architecting data‑driven solutions for our document‑based compliance management platform.

Responsibilities

Build and deploy Generative AI features using foundation models (AWS Bedrock, OpenAI, Anthropic Claude) and RAG architectures with vector databases for compliance‑document understanding.

Design agentic AI systems that autonomously handle compliance workflows, document review, regulatory mapping, and multi‑step reasoning tasks.

Implement comprehensive LLM evaluation frameworks with automated pipelines, custom metrics, benchmark datasets, and safety guardrails to ensure regulatory compliance.

Build end‑to‑end MLOps pipelines for model training, deployment, monitoring, versioning, and automated retraining with drift detection.

Develop predictive models for compliance risk scoring, regulatory change impact, anomaly detection, and time‑series forecasting.

Write production‑quality Python code for data processing, feature engineering, API development (FastAPI/Flask), and ETL/ELT workflows.

Lead A/B experiments and product analytics to measure AI feature impact and drive data‑driven decision‑making.

Create explainability frameworks (SHAP/LIME) and monitoring dashboards to ensure transparency and regulatory adherence.

Collaborate with cross‑functional teams to translate business needs into ML solutions and communicate insights to stakeholders.

Qualifications Experience & Education

7+ years in data science, ML engineering, or related roles.

3+ years building NLP/generative AI applications and implementing MLOps in production.

Bachelor’s or Master’s degree in Data Science, Computer Science, Statistics, or related field (PhD preferred).

Track record of deploying ML systems that process large‑scale datasets with proper monitoring and governance.

Skills

Python 5+ years: Pandas, NumPy, scikit‑learn, XGBoost, TensorFlow/PyTorch, Hugging Face Transformers, FastAPI/Flask, MLflow, pytest.

SQL: Advanced proficiency with complex queries, window functions, and optimization.

Machine Learning & NLP: Strong foundation in supervised/unsupervised learning, deep learning, document understanding, text classification, and semantic analysis.

Generative AI & LLMs: Experience with GPT, Claude, Llama, prompt engineering, RAG architectures, vector databases (Pinecone, Weaviate, Chroma).

MLOps & ModelOps: End‑to‑end experience with ML pipelines, experiment tracking (MLflow, W&B), model versioning, feature stores, drift detection, CI/CD for ML, Docker.

LLM Evaluation: Experience with evaluation frameworks (RAGAS, DeepEval), custom metrics, benchmark datasets, human‑in‑the‑loop validation.

Cloud & AWS: SageMaker, Bedrock, S3, Lambda, EC2, CloudWatch.

Statistics & Experimentation: Statistics, A/B testing, causal inference, experimental design.

Visualization: Tableau, Power BI, or Python visualization libraries.

Preferred Qualifications

Experience with agentic AI frameworks (LangGraph, LangChain, AutoGen, CrewAI).

Knowledge of Life Sciences/regulated industries (FDA, EMA, ISO, GxP) and compliance management systems.

Familiarity with big data tools (Spark, Databricks, Snowflake), orchestration (Airflow, Kubeflow), monitoring tools (Datadog, Prometheus).

Experience with LLM fine‑tuning, document processing libraries, multi‑modal AI, or distributed training.

Understanding of ML governance, bias detection, model risk management, and data privacy regulations (GDPR, CCPA, HIPAA).

Experience working in agile environments with Jira.

AWS ML certifications or similar credentials.

Key Competencies

Strong communication skills to explain complex models to technical and non‑technical audiences.

Ability to work independently and collaboratively in fast‑paced environments.

Proven ability to convert POCs into production‑grade solutions.

Understanding of ethical AI and building trustworthy, explainable systems for regulated environments.

Hexagon does not provide visa sponsorship at any time during employment; applicants requiring sponsorship are advised not to apply.

About Hexagon Hexagon is a global leader in digital reality solutions, combining sensor, software and autonomous technologies to help clients increase productivity, safety and efficiency across industrial and public sector projects. Hexagon’s Asset Lifecycle Intelligence division empowers customers to unlock data, accelerate digital maturity and create sustainable outcomes.

Why Work for Hexagon Hexagon’s Asset Lifecycle Intelligence division is recognized as an engaged and enabled workplace and is committed to providing the resources needed for professional growth in a supportive and inclusive environment.

Everyone is Welcome Our company embraces diversity and inclusion, is an equal opportunity employer, and is committed to fairness and respect for all employees.

#J-18808-Ljbffr