Logo
C the Signs

AI Data Engineer

C the Signs, Boston, Massachusetts, us, 02298

Save Job

Position Summary

The Data Engineer will play a crucial role in developing and fine-tuning data specifically for our LLMs and machine learning models. This individual will be responsible for the entire data lifecycle, including gathering, cleaning, structuring, and optimizing large, diverse healthcare datasets. The ideal candidate will have a strong background in data engineering principles, experience with big data technologies, and a keen understanding of the unique challenges and requirements of healthcare data.

You will design, build, and maintain scalable data pipelines that source, preprocess, and deliver high-quality, high-volume datasets to our machine learning engineers. This role requires a deep understanding of data engineering best practices coupled with specific knowledge of the data requirements for LLM training and refinement.

Key Responsibilities

Collaborate with data scientists and machine learning engineers to understand data requirements for LLM and machine learning model fine-tuning

Design, build, and maintain scalable data pipelines to ingest, process, and store massive and diverse healthcare datasets

Implement robust data validation and monitoring to ensure the integrity, accuracy, and consistency of all training datasets

Implement robust data cleaning, validation, and transformation processes to ensure data quality and integrity

Develop and optimize data structures and schemas for efficient access and utilization by LLMs and machine learning models

Work with the team to identify and acquire new data sources, ensuring compliance with relevant healthcare regulations (e.g., HIPAA)

Monitor data pipeline performance, troubleshoot issues, and implement optimizations to improve efficiency and reliability

Document data engineering processes, data models, and data dictionaries

Stay up-to-date with the latest advancements in data engineering, big data technologies, and machine learning

Requirements

Required

Bachelor's degree in Computer Science, Engineering, or a related field

Proven experience as a Data Engineer, with a focus on big data technologies

Strong proficiency in programming languages such as Python, Scala, or Java

Extensive experience with data warehousing, ETL processes, and data modeling

Experience with major cloud providers (e.g., AWS, GCP, Azure) and their data storage and processing services

Hands‑on experience with big data frameworks like Apache Spark for distributed processing

Excellent problem‑solving skills and the ability to work independently and as part of a team

Strong communication and interpersonal skills

Preferred

Master's degree in a related field

Experience with healthcare data and a good understanding of healthcare data standards (e.g., FHIR, HL7)

Familiarity with machine learning concepts and LLM fine‑tuning processes

Experience with data orchestration tools (e.g., Apache Airflow)

Work Authorization

Must be a US Citizen, Green Card holder, or currently in the US with a valid H1B visa

Benefits

Competitive salary and benefits package

Flexible working arrangements (remote or hybrid options available)

The opportunity to work on life‑changing AI technology that directly impacts patient outcomes

Join a team that combines cutting‑edge innovation with a mission to save lives and improve health equity

Continuous learning opportunities with access to the latest tools and advancements in AI and healthcare

#J-18808-Ljbffr