Palladian Partners, Inc.
Principal Data Engineer – ML Platforms
Palladian Partners, Inc., Arlington, Virginia, United States, 22201
Overview
Altarum | Data & AI Center of Excellence (CoE)
Altarum is building the future of data and AI infrastructure for public health. We are hiring a
Principal Data Engineer – ML Platforms
to design, build, and operationalize modern data and ML platform capabilities that power analytics, evaluation, AI modeling, and interoperability across all Altarum divisions.
What You'll Work On
ML Platform Engineering:
lakehouse architecture, pipelines, MLOps lifecycle
Applied ML Enablement:
risk scoring, forecasting, Medicaid analytics
NLP/Generative AI Support:
RAG, vectorization, health communications
Causal ML Operationalization:
evaluation modeling workflows
Responsible/Trusted AI Engineering:
model cards, fairness, compliance
Responsibilities
Platform Architecture & Delivery:
design and operate modern, cloud-agnostic lakehouse using object storage, SQL/ELT engines, and dbt.
Build CI/CD pipelines for data, dbt, and model delivery (GitHub Actions, GitLab, Azure DevOps).
Implement MLOps systems: MLflow or equivalent, feature stores, model registry, drift detection, automated testing.
Engineer solutions in AWS GovCloud today, with portability to Azure Gov or GCP.
Use IaC (Terraform, CloudFormation, Bicep) to automate secure deployments.
Pipelines & Interoperability
Build scalable ingestion and normalization pipelines for healthcare and public health datasets, including FHIR R4 / US Core (preferred), HL7 v2 (preferred), Medicaid/Medicare claims & encounters (preferred), SDOH & geospatial data (preferred), survey and qualitative data.
Create reusable connectors, dbt packages, and data contracts for cross-division use.
Publish clean, conformed, metrics-ready tables for Analytics Engineering and BI teams.
Support Population Health in turning evaluation and statistical models into pipelines.
Data Quality, Reliability & Cost Management
Define SLOs and alerting; instrument lineage & metadata; ensure ≥95% of data tests pass.
Perform performance and cost tuning (partitioning, storage tiers, autoscaling) with guardrails and dashboards.
Applied ML Enablement & Generative AI
Build production-grade pipelines for risk prediction, forecasting, cost/utilization models, and burden estimation.
Develop ML-ready feature engineering workflows and support time-series/outbreak detection models.
Integrate ML assets into standardized deployment workflows.
Build ingestion and vectorization pipelines for surveys, interviews, and unstructured text.
Support RAG systems for synthesis, evaluation, and public health guidance.
Enable secure, controlled-generation environments.
Causal ML & Evaluation Engineering
Translate R/Stata/SAS evaluation code into reusable pipelines.
Build templates for causal inference workflows (DID, AIPW, CEM, synthetic controls).
Support operationalization of ARA’s applied research methods at scale.
Responsible AI, Security & Compliance
Implement Model Card Protocol (MCP) and fairness/explainability tooling (SHAP, LIME).
Ensure compliance with HIPAA, 42 CFR Part 2, IRB/DUA constraints, and NIST AI RMF standards.
Enforce privacy-by-design: tokenization, encryption, least-privilege IAM, and VPC isolation.
Reuse, Shared-Services, and Enablement
Develop runbooks, architecture diagrams, repo templates, and accelerator code.
Pair with data scientists, analysts, and SMEs to build organizational capability.
Provide technical guidance for proposals and client engagements.
Your First 90 Days
Platform skeleton operational: repo templates, CI/CD, dbt project, MLflow registry, tests.
Two pipelines in production (e.g., FHIR → analytics and claims normalization).
One end-to-end CoE lighthouse MVP delivered (ingestion → model → metrics → BI).
Completed playbooks for GovCloud deployment, identity/secrets, rollback, and cost control.
Success Metrics (KPIs)
Pipeline reliability meeting SLA/SLO targets.
≥95% data tests passing across pipelines.
MVP dataset onboarding ≤ 4 weeks.
Reuse of platform assets across ≥3 divisions.
Cost optimization and budget adherence.
What You'll Bring
7–10+ years in data engineering, ML platform engineering, or cloud data architecture.
Expert in Python, SQL, dbt, and orchestration tools (Airflow, Glue, Step Functions).
Deep experience with AWS + AWS GovCloud.
CI/CD and IaC experience (Terraform, CloudFormation).
Familiarity with MLOps tools (MLflow, SageMaker, Azure ML, Vertex AI).
Ability to operate in regulated environments (HIPAA, 42 CFR Part 2, IRB).
Preferred
Experience with FHIR, HL7, Medicaid/Medicare claims, and/or SDOH datasets.
Databricks, Snowflake, Redshift, Synapse.
Event streaming (Kafka, Kinesis, Event Hubs).
Feature store experience.
Observability tooling (Grafana, Prometheus, OpenTelemetry).
Experience optimizing BI datasets for Power BI.
Logistical Requirements
Only candidates eligible to work in the United States and not requiring sponsorship will be considered.
Work must be completed in the continental U.S. unless required by contract.
Near our offices (Arlington VA; Silver Spring MD; Novi MI) employees will join in person one day every other month (6 times per year) for Collaboration Day.
Must be able to work during Eastern Time unless approved by manager.
Remote workers must have a dedicated, ergonomically appropriate workspace free from distractions and a mobile device that allows for productive business conduct.
Compensation $144,771 – $188,036 a year
This salary range is not a guarantee; the final offer amount will vary based on factors such as skill set and experience.
Benefits
Competitive Medical, Dental and Optical plans
Generous Paid Time Off, 8 company observed holidays plus 3 floating holidays
Tuition Assistance
401K Plan (3% employer contribution plus opportunity for gainsharing)
Life, AD&D & Disability coverage
A flexible work environment
About Altarum Altarum is a nonprofit focused on improving health for individuals with fewer financial resources and populations disenfranchised by the health care system. We work primarily on behalf of federal and state governments to design and implement solutions that achieve measurable results. We combine expertise in public health and health care delivery with technology development, practice transformation, training, and technical assistance, quality improvement, data analytics, and applied research and evaluation. Our innovative solutions and proven processes lead to better value and health for all.
Equal Opportunity Altarum is an equal opportunity employer that provides employment opportunities to all qualified individuals without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, or any other characteristic protected by applicable law.
#J-18808-Ljbffr
Altarum is building the future of data and AI infrastructure for public health. We are hiring a
Principal Data Engineer – ML Platforms
to design, build, and operationalize modern data and ML platform capabilities that power analytics, evaluation, AI modeling, and interoperability across all Altarum divisions.
What You'll Work On
ML Platform Engineering:
lakehouse architecture, pipelines, MLOps lifecycle
Applied ML Enablement:
risk scoring, forecasting, Medicaid analytics
NLP/Generative AI Support:
RAG, vectorization, health communications
Causal ML Operationalization:
evaluation modeling workflows
Responsible/Trusted AI Engineering:
model cards, fairness, compliance
Responsibilities
Platform Architecture & Delivery:
design and operate modern, cloud-agnostic lakehouse using object storage, SQL/ELT engines, and dbt.
Build CI/CD pipelines for data, dbt, and model delivery (GitHub Actions, GitLab, Azure DevOps).
Implement MLOps systems: MLflow or equivalent, feature stores, model registry, drift detection, automated testing.
Engineer solutions in AWS GovCloud today, with portability to Azure Gov or GCP.
Use IaC (Terraform, CloudFormation, Bicep) to automate secure deployments.
Pipelines & Interoperability
Build scalable ingestion and normalization pipelines for healthcare and public health datasets, including FHIR R4 / US Core (preferred), HL7 v2 (preferred), Medicaid/Medicare claims & encounters (preferred), SDOH & geospatial data (preferred), survey and qualitative data.
Create reusable connectors, dbt packages, and data contracts for cross-division use.
Publish clean, conformed, metrics-ready tables for Analytics Engineering and BI teams.
Support Population Health in turning evaluation and statistical models into pipelines.
Data Quality, Reliability & Cost Management
Define SLOs and alerting; instrument lineage & metadata; ensure ≥95% of data tests pass.
Perform performance and cost tuning (partitioning, storage tiers, autoscaling) with guardrails and dashboards.
Applied ML Enablement & Generative AI
Build production-grade pipelines for risk prediction, forecasting, cost/utilization models, and burden estimation.
Develop ML-ready feature engineering workflows and support time-series/outbreak detection models.
Integrate ML assets into standardized deployment workflows.
Build ingestion and vectorization pipelines for surveys, interviews, and unstructured text.
Support RAG systems for synthesis, evaluation, and public health guidance.
Enable secure, controlled-generation environments.
Causal ML & Evaluation Engineering
Translate R/Stata/SAS evaluation code into reusable pipelines.
Build templates for causal inference workflows (DID, AIPW, CEM, synthetic controls).
Support operationalization of ARA’s applied research methods at scale.
Responsible AI, Security & Compliance
Implement Model Card Protocol (MCP) and fairness/explainability tooling (SHAP, LIME).
Ensure compliance with HIPAA, 42 CFR Part 2, IRB/DUA constraints, and NIST AI RMF standards.
Enforce privacy-by-design: tokenization, encryption, least-privilege IAM, and VPC isolation.
Reuse, Shared-Services, and Enablement
Develop runbooks, architecture diagrams, repo templates, and accelerator code.
Pair with data scientists, analysts, and SMEs to build organizational capability.
Provide technical guidance for proposals and client engagements.
Your First 90 Days
Platform skeleton operational: repo templates, CI/CD, dbt project, MLflow registry, tests.
Two pipelines in production (e.g., FHIR → analytics and claims normalization).
One end-to-end CoE lighthouse MVP delivered (ingestion → model → metrics → BI).
Completed playbooks for GovCloud deployment, identity/secrets, rollback, and cost control.
Success Metrics (KPIs)
Pipeline reliability meeting SLA/SLO targets.
≥95% data tests passing across pipelines.
MVP dataset onboarding ≤ 4 weeks.
Reuse of platform assets across ≥3 divisions.
Cost optimization and budget adherence.
What You'll Bring
7–10+ years in data engineering, ML platform engineering, or cloud data architecture.
Expert in Python, SQL, dbt, and orchestration tools (Airflow, Glue, Step Functions).
Deep experience with AWS + AWS GovCloud.
CI/CD and IaC experience (Terraform, CloudFormation).
Familiarity with MLOps tools (MLflow, SageMaker, Azure ML, Vertex AI).
Ability to operate in regulated environments (HIPAA, 42 CFR Part 2, IRB).
Preferred
Experience with FHIR, HL7, Medicaid/Medicare claims, and/or SDOH datasets.
Databricks, Snowflake, Redshift, Synapse.
Event streaming (Kafka, Kinesis, Event Hubs).
Feature store experience.
Observability tooling (Grafana, Prometheus, OpenTelemetry).
Experience optimizing BI datasets for Power BI.
Logistical Requirements
Only candidates eligible to work in the United States and not requiring sponsorship will be considered.
Work must be completed in the continental U.S. unless required by contract.
Near our offices (Arlington VA; Silver Spring MD; Novi MI) employees will join in person one day every other month (6 times per year) for Collaboration Day.
Must be able to work during Eastern Time unless approved by manager.
Remote workers must have a dedicated, ergonomically appropriate workspace free from distractions and a mobile device that allows for productive business conduct.
Compensation $144,771 – $188,036 a year
This salary range is not a guarantee; the final offer amount will vary based on factors such as skill set and experience.
Benefits
Competitive Medical, Dental and Optical plans
Generous Paid Time Off, 8 company observed holidays plus 3 floating holidays
Tuition Assistance
401K Plan (3% employer contribution plus opportunity for gainsharing)
Life, AD&D & Disability coverage
A flexible work environment
About Altarum Altarum is a nonprofit focused on improving health for individuals with fewer financial resources and populations disenfranchised by the health care system. We work primarily on behalf of federal and state governments to design and implement solutions that achieve measurable results. We combine expertise in public health and health care delivery with technology development, practice transformation, training, and technical assistance, quality improvement, data analytics, and applied research and evaluation. Our innovative solutions and proven processes lead to better value and health for all.
Equal Opportunity Altarum is an equal opportunity employer that provides employment opportunities to all qualified individuals without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, or any other characteristic protected by applicable law.
#J-18808-Ljbffr