Boston Public Health Commission
The Data Engineer - Casual will support the Boston Public Health Commission’s Data Modernization Initiative (DMI), focusing on building and maintaining data pipelines in Microsoft Azure platform (Azure Data Factory, Azure Data Lake Gen2), improving data quality, and supporting the development of BPHC’s Azure Data Lake. This role offers hands‑on experience with cloud data engineering, automation, write SQL or Python scripts to clean, transform, and validate datasets prior to storage in the Azure Data Lake and governance tools used to modernize public health systems.
Key Responsibilities
Assist in building and maintaining
ETL/ELT data pipelines
that load data into the BPHC Data Lake and Data Warehouse.
Help design and implement
data ingestion workflows
for structured and unstructured datasets from APIs, external systems, databases, flat files, and public data sources.
Support
Data Lake organization , including folder structures, metadata tagging, data partitioning, and schema alignment.
Maintain high‑quality data storage practices including
data versioning, lineage tracking, and format optimization
(e.g., Parquet, Delta).
Participate in building and optimizing
data models, tables, and views
used by dashboards and analytic systems.
Support data validation, quality checks, deduplication, and data cleaning processes.
Document data flows, pipeline logic, and transformations for the Data Lake environment.
Assist in automating ingestion and transformation processes using Python, SQL, and Azure‑based tools.
Collaborate with analysts and program teams to understand data needs and implement scalable solutions.
Qualifications
Foundational programming experience in SQL, Python, or R.
Basic understanding of data pipelines, ETL/ELT processes, and data modeling concepts.
Exposure to cloud platforms such as Microsoft Azure, AWS, or Google Cloud (Azure preferred).
Familiarity with tools like Azure Data Factory, Databricks, or Synapse Analytics is a plus.
Experience with version control (e.g., GitHub) and data visualization tools (e.g., Power BI or Tableau) preferred.
Awareness of data security, privacy, and governance principles (HIPAA, metadata standards, etc.) is a plus.
#J-18808-Ljbffr
Key Responsibilities
Assist in building and maintaining
ETL/ELT data pipelines
that load data into the BPHC Data Lake and Data Warehouse.
Help design and implement
data ingestion workflows
for structured and unstructured datasets from APIs, external systems, databases, flat files, and public data sources.
Support
Data Lake organization , including folder structures, metadata tagging, data partitioning, and schema alignment.
Maintain high‑quality data storage practices including
data versioning, lineage tracking, and format optimization
(e.g., Parquet, Delta).
Participate in building and optimizing
data models, tables, and views
used by dashboards and analytic systems.
Support data validation, quality checks, deduplication, and data cleaning processes.
Document data flows, pipeline logic, and transformations for the Data Lake environment.
Assist in automating ingestion and transformation processes using Python, SQL, and Azure‑based tools.
Collaborate with analysts and program teams to understand data needs and implement scalable solutions.
Qualifications
Foundational programming experience in SQL, Python, or R.
Basic understanding of data pipelines, ETL/ELT processes, and data modeling concepts.
Exposure to cloud platforms such as Microsoft Azure, AWS, or Google Cloud (Azure preferred).
Familiarity with tools like Azure Data Factory, Databricks, or Synapse Analytics is a plus.
Experience with version control (e.g., GitHub) and data visualization tools (e.g., Power BI or Tableau) preferred.
Awareness of data security, privacy, and governance principles (HIPAA, metadata standards, etc.) is a plus.
#J-18808-Ljbffr