Colorectal Cancer Alliance
TITLE:
Senior Data Engineer
ORGANIZATION:
Colorectal Cancer Alliance
LOCATION:
Washington DC candidates ONLY
POSITION TYPE:
Full-Time, Exempt
REPORTS TO:
Chief Data & Analytics Officer
COMPENSATION : $140,000-$160,000 annual salary; Healthcare benefits are available for this role.
ORGANIZATION OVERVIEW:
The Colorectal Cancer Alliance is a national nonprofit organization committed to ending colorectal cancer within our lifetime. We help patients, families, survivors, and caregivers navigate diagnosis and treatment options, connect them with those who can share experiences and knowledge, and identify resources to meet their needs. We partner with healthcare professionals and social influencers to raise awareness of preventative screening, and we collaborate with researchers to better understand the disease and fund critical research. Our efforts are urgent, effective, and efficient because we believe that tomorrow can't wait.
POSITION OVERVIEW:
At the Colorectal Cancer Alliance, we are building an innovative patient-centric, data-driven precision oncology platform to transform the future of colorectal cancer awareness, care and research. Our core systems - BlueHQ, BlueLake, and K-SPY - are designed to empower patients, caregivers, healthcare providers, and researchers through scalable, interoperable, and AI-ready architectures.
The
Senior Data Engineer
will play a pivotal role in designing and orchestrating robust, scalable, and governed data pipelines across the Colorectal Cancer Alliance's precision oncology platforms -
BlueHQ ,
BlueLake , and
K-SPY .
This role uniquely spans upward into DevOps and platform reliability engineering, and downward into analytics enablement and data modeling, creating a critical bridge between infrastructure, governance, and real-world data usability. The ideal candidate brings strong AWS-native experience, understands HIPAA and regulatory-grade data workflows, and thrives in environments where both back-end automation and front-end data quality are mission-critical.
You will collaborate across engineering, analytics, navigation, and research teams to fuel high-quality, semantically harmonized, and analytically ready data that powers patient navigation, real-world evidence generation, and precision clinical trial matching.
POSITION RESPONSIBILITIES:
Key responsibilities include, but are not limited to: Data Pipeline Engineering & Architecture
Design, build, and optimize modular ETL/ELT pipelines to ingest data from REDCap (MySQL & APIs), Salesforce NPC, S3, and external EHR/FHIR feeds into AWS-native environments (Glue, Redshift, Athena, HealthLake). Engineer scalable data ingestion and transformation workflows using Python, SQL, and dbt, aligned with CI/CD principles. Enable real-time and event-driven ingestion via Kafka/Kinesis/EventBridge, and support zero-copy federation through Redshift Spectrum and Athena.
DevOps & Platform Integration
Partner with DevOps engineers to implement infrastructure-as-code (IaC) for data services (Terraform, CDK) and manage secure, compliant cloud deployments (IAM, VPC, Secrets Manager). Ensure production-grade observability through logging, alerting, and data quality checks (Bigeye, Monte Carlo, or similar). Maintain HIPAA-compliant CI/CD pipelines (e.g., GitHub Actions, dbt Cloud) and enforce version-controlled, auditable transformations.
Data Modeling & Semantic Harmonization
Model patient, clinical, and engagement data across FHIR, OMOP, and internal canonical schemas (SCF), enriched with terminologies like NCIT and SNOMED. Design schemas and transformations supporting longitudinal patient journeys, trial enrollment, navigation events, and real-world outcomes. OLTP and OLAP experience.
Analytics & AI Readiness
Collaborate with Analytics Engineers and Data Scientists to prepare datasets for Analytics including ML/AI (e.g., SageMaker). Align data outputs with cohort builders, dashboards, predictive models, and RDF/SPARQL knowledge graph queries.
Metadata & Governance Enablement
Integrate with DataHub and other metadata platforms to ensure end-to-end lineage, access control, and semantic traceability. Design pipelines that are compliant with HIPAA, GDPR, 21 CFR Part 11, and IRB-approved research protocols.
REQUIRED QUALIFICATIONS
Mandatory 8-12+ years in data engineering with AWS-native services (Glue, Redshift, Athena, Lake Formation, S3). Strong command of Python, SQL, and dbt in production-grade analytics environments. Experience with structured and semi-structured health data sources and protocols (REDCap, Salesforce, FHIR/HL7, JSON, CSV). Familiarity with DevOps practices, including CI/CD, secrets management, and infrastructure automation. Strong foundation in HIPAA, data privacy, security controls, and federated data architectures. Proven track record integrating real-world data (RWD), patient registry, or clinical trial ecosystems. Experience in designing for analytics-ready, semantically aligned data. PREFERRED QUALIFICATIONS
AWS Certification (e.g., Data Analytics Specialty, Solutions Architect). Experience with:
REDCap back-ends, APIs, and clinical registry architecture Salesforce Nonprofit Cloud, MuleSoft, and API-led integration strategies Metadata catalogs (DataHub), observability tools (Bigeye, Monte Carlo) Linked data and knowledge graph platforms (e.g., AWS Neptune, Stardog) Real-time data movement tools (EventBridge, Kafka, Kinesis)
Background in life sciences, clinical research, or health informatics. This Role Is Ideal If You...
Live in the Baltimore, Washington, Northern VA metro area. Want to architect scalable, compliant, and intelligent pipelines that drive AI and analytics. Are comfortable working "north" with DevOps on infra, and "south" with Analytics Engineers on modeling. Enjoy aligning technical solutions with clinical, research, and patient-centric goals. Are passionate about enabling better outcomes for cancer patients through data.
SALARY RANGE:
Competitive non-profit salary, typically ranging from
$140,000 - 160,000 , based on experience and qualifications.
Washington, DC candidates only. No 3rd party inquiries.
HOW TO APPLY:
To apply, please complete the application in our ADP Workforce Now application portal.
To see all employment opportunities at the Alliance, please click here to be directed to our careers site.
If you encounter any issues with this application, please contact us at jobs@ccalliance.org
STATEMENT OF NON-DISCRIMINATION:
The Colorectal Cancer Alliance does not discriminate on the basis of race, color, gender, disability, age, religion, sexual orientation, nationality, or ethnicity. We are strongly committed to hiring a diverse and multicultural staff and encourage applications from all backgrounds.
Senior Data Engineer
ORGANIZATION:
Colorectal Cancer Alliance
LOCATION:
Washington DC candidates ONLY
POSITION TYPE:
Full-Time, Exempt
REPORTS TO:
Chief Data & Analytics Officer
COMPENSATION : $140,000-$160,000 annual salary; Healthcare benefits are available for this role.
ORGANIZATION OVERVIEW:
The Colorectal Cancer Alliance is a national nonprofit organization committed to ending colorectal cancer within our lifetime. We help patients, families, survivors, and caregivers navigate diagnosis and treatment options, connect them with those who can share experiences and knowledge, and identify resources to meet their needs. We partner with healthcare professionals and social influencers to raise awareness of preventative screening, and we collaborate with researchers to better understand the disease and fund critical research. Our efforts are urgent, effective, and efficient because we believe that tomorrow can't wait.
POSITION OVERVIEW:
At the Colorectal Cancer Alliance, we are building an innovative patient-centric, data-driven precision oncology platform to transform the future of colorectal cancer awareness, care and research. Our core systems - BlueHQ, BlueLake, and K-SPY - are designed to empower patients, caregivers, healthcare providers, and researchers through scalable, interoperable, and AI-ready architectures.
The
Senior Data Engineer
will play a pivotal role in designing and orchestrating robust, scalable, and governed data pipelines across the Colorectal Cancer Alliance's precision oncology platforms -
BlueHQ ,
BlueLake , and
K-SPY .
This role uniquely spans upward into DevOps and platform reliability engineering, and downward into analytics enablement and data modeling, creating a critical bridge between infrastructure, governance, and real-world data usability. The ideal candidate brings strong AWS-native experience, understands HIPAA and regulatory-grade data workflows, and thrives in environments where both back-end automation and front-end data quality are mission-critical.
You will collaborate across engineering, analytics, navigation, and research teams to fuel high-quality, semantically harmonized, and analytically ready data that powers patient navigation, real-world evidence generation, and precision clinical trial matching.
POSITION RESPONSIBILITIES:
Key responsibilities include, but are not limited to: Data Pipeline Engineering & Architecture
Design, build, and optimize modular ETL/ELT pipelines to ingest data from REDCap (MySQL & APIs), Salesforce NPC, S3, and external EHR/FHIR feeds into AWS-native environments (Glue, Redshift, Athena, HealthLake). Engineer scalable data ingestion and transformation workflows using Python, SQL, and dbt, aligned with CI/CD principles. Enable real-time and event-driven ingestion via Kafka/Kinesis/EventBridge, and support zero-copy federation through Redshift Spectrum and Athena.
DevOps & Platform Integration
Partner with DevOps engineers to implement infrastructure-as-code (IaC) for data services (Terraform, CDK) and manage secure, compliant cloud deployments (IAM, VPC, Secrets Manager). Ensure production-grade observability through logging, alerting, and data quality checks (Bigeye, Monte Carlo, or similar). Maintain HIPAA-compliant CI/CD pipelines (e.g., GitHub Actions, dbt Cloud) and enforce version-controlled, auditable transformations.
Data Modeling & Semantic Harmonization
Model patient, clinical, and engagement data across FHIR, OMOP, and internal canonical schemas (SCF), enriched with terminologies like NCIT and SNOMED. Design schemas and transformations supporting longitudinal patient journeys, trial enrollment, navigation events, and real-world outcomes. OLTP and OLAP experience.
Analytics & AI Readiness
Collaborate with Analytics Engineers and Data Scientists to prepare datasets for Analytics including ML/AI (e.g., SageMaker). Align data outputs with cohort builders, dashboards, predictive models, and RDF/SPARQL knowledge graph queries.
Metadata & Governance Enablement
Integrate with DataHub and other metadata platforms to ensure end-to-end lineage, access control, and semantic traceability. Design pipelines that are compliant with HIPAA, GDPR, 21 CFR Part 11, and IRB-approved research protocols.
REQUIRED QUALIFICATIONS
Mandatory 8-12+ years in data engineering with AWS-native services (Glue, Redshift, Athena, Lake Formation, S3). Strong command of Python, SQL, and dbt in production-grade analytics environments. Experience with structured and semi-structured health data sources and protocols (REDCap, Salesforce, FHIR/HL7, JSON, CSV). Familiarity with DevOps practices, including CI/CD, secrets management, and infrastructure automation. Strong foundation in HIPAA, data privacy, security controls, and federated data architectures. Proven track record integrating real-world data (RWD), patient registry, or clinical trial ecosystems. Experience in designing for analytics-ready, semantically aligned data. PREFERRED QUALIFICATIONS
AWS Certification (e.g., Data Analytics Specialty, Solutions Architect). Experience with:
REDCap back-ends, APIs, and clinical registry architecture Salesforce Nonprofit Cloud, MuleSoft, and API-led integration strategies Metadata catalogs (DataHub), observability tools (Bigeye, Monte Carlo) Linked data and knowledge graph platforms (e.g., AWS Neptune, Stardog) Real-time data movement tools (EventBridge, Kafka, Kinesis)
Background in life sciences, clinical research, or health informatics. This Role Is Ideal If You...
Live in the Baltimore, Washington, Northern VA metro area. Want to architect scalable, compliant, and intelligent pipelines that drive AI and analytics. Are comfortable working "north" with DevOps on infra, and "south" with Analytics Engineers on modeling. Enjoy aligning technical solutions with clinical, research, and patient-centric goals. Are passionate about enabling better outcomes for cancer patients through data.
SALARY RANGE:
Competitive non-profit salary, typically ranging from
$140,000 - 160,000 , based on experience and qualifications.
Washington, DC candidates only. No 3rd party inquiries.
HOW TO APPLY:
To apply, please complete the application in our ADP Workforce Now application portal.
To see all employment opportunities at the Alliance, please click here to be directed to our careers site.
If you encounter any issues with this application, please contact us at jobs@ccalliance.org
STATEMENT OF NON-DISCRIMINATION:
The Colorectal Cancer Alliance does not discriminate on the basis of race, color, gender, disability, age, religion, sexual orientation, nationality, or ethnicity. We are strongly committed to hiring a diverse and multicultural staff and encourage applications from all backgrounds.