Excel Campus Activities
Job Summary
The Data Scientist design, build, and operate robust, secure data pipelines that power clinical‑research products, analytics dashboards, and downstream data‑science workloads. Partner closely with clinicians, investigators, Office of Information Technology and collaborators’ Business Intelligence teams, and external research collaborators to translate complex biomedical data into actionable insights.
Responsibilities
Architect end‑to‑end pipelines that ingest high‑volume de‑identified clinical, genomic and phenotypic datasets from collaborators’ EHR systems (Epic Clarity/Caboodle) and cloud storage.
Build and host production‑grade web portals and REST APIs for secure researcher/clinician access supporting role‑based permissions and audit trails.
Leverage OpenAI LLMs (or similar NLP services) to auto‑extract Human Phenotype Ontology (HPO) terms from de‑identified clinical documentation.
Design high‑throughput ETL workflows that parse heterogeneous datasets for ingestion into relational databases and cloud‑native warehouses, feeding results into downstream analytics pipelines.
Design and develop real‑time capable analytical systems to integrate with and/or augment EHR systems.
Perform systems administration for data‑platform hosts, including system hardening, patch management, firewall configuration.
Implement monitoring stacks and custom health checks to maintain near‑continuous system availability.
Translate clinical research requirements into technical specifications, producing clear data‑model diagrams, lineage documentation, and data‑dictionary artifacts.
Deliver data‑product demos to investigators, effectively showcasing how pipeline outputs support precision medicine reporting.
Champion standards for metadata management, schema versioning, and test‑driven data engineering.
Other duties as assigned.
Minimum Qualifications
Bachelor’s degree in Computer Science, Engineering or related field (or equivalent experience).
Seven (7) years of professional experience in data engineering, software development, or an equivalent mix of education and relevant experience in similar role.
Preferred Qualifications
Experience with Snowflake, Microsoft Azure Synapse, or other modern data‑warehouse platforms.
Exposure to machine‑learning pipelines (e.g., using OpenAI or other LLM services).
Experience building/maintaining cloud data platforms (such as GCP, OCI, Linode, AWS, Azure) and data‑lake/warehouse solutions, as well as production workload management.
Hands‑on Linux system administration (containerization, networking, security).
Knowledge, Skills & Abilities
Expert expertise in SQL (PostgreSQL, SQL Server, MySQL, etc.) and data‑modeling (relational & dimensional).
Proficiency in Python (or another modern language) for ETL, API integration, and automation.
Knowledge of healthcare data standards (Epic Clarity/Caboodle, HL7/FHIR, HPO, etc.) – preferred.
Ability to mentor junior engineers and promote best practices.
Excellent communication & storytelling skills for cross‑functional collaboration.
EEO Statement It is the policy of The University of Texas at Arlington to provide an educational and working environment that provides equal opportunity to all members of the University community. In accordance with federal and state law, the University prohibits unlawful discrimination, including harassment, on the basis of race, color, national origin, religion, age, sex, sexual orientation, pregnancy, disability, genetic information, and or veteran status. The University also prohibits discrimination on the basis of gender identity, and gender expression. Retaliation against persons who oppose a discriminatory practice, file a charge of discrimination, or testify for, assist in, or participate in an investigative proceeding relating to discrimination is prohibited. Constitutionally‑protected expression will not be considered discrimination or harassment under this policy. It is the responsibility of all departments, employees, and students to ensure the University’s compliance with this policy.
#J-18808-Ljbffr
Responsibilities
Architect end‑to‑end pipelines that ingest high‑volume de‑identified clinical, genomic and phenotypic datasets from collaborators’ EHR systems (Epic Clarity/Caboodle) and cloud storage.
Build and host production‑grade web portals and REST APIs for secure researcher/clinician access supporting role‑based permissions and audit trails.
Leverage OpenAI LLMs (or similar NLP services) to auto‑extract Human Phenotype Ontology (HPO) terms from de‑identified clinical documentation.
Design high‑throughput ETL workflows that parse heterogeneous datasets for ingestion into relational databases and cloud‑native warehouses, feeding results into downstream analytics pipelines.
Design and develop real‑time capable analytical systems to integrate with and/or augment EHR systems.
Perform systems administration for data‑platform hosts, including system hardening, patch management, firewall configuration.
Implement monitoring stacks and custom health checks to maintain near‑continuous system availability.
Translate clinical research requirements into technical specifications, producing clear data‑model diagrams, lineage documentation, and data‑dictionary artifacts.
Deliver data‑product demos to investigators, effectively showcasing how pipeline outputs support precision medicine reporting.
Champion standards for metadata management, schema versioning, and test‑driven data engineering.
Other duties as assigned.
Minimum Qualifications
Bachelor’s degree in Computer Science, Engineering or related field (or equivalent experience).
Seven (7) years of professional experience in data engineering, software development, or an equivalent mix of education and relevant experience in similar role.
Preferred Qualifications
Experience with Snowflake, Microsoft Azure Synapse, or other modern data‑warehouse platforms.
Exposure to machine‑learning pipelines (e.g., using OpenAI or other LLM services).
Experience building/maintaining cloud data platforms (such as GCP, OCI, Linode, AWS, Azure) and data‑lake/warehouse solutions, as well as production workload management.
Hands‑on Linux system administration (containerization, networking, security).
Knowledge, Skills & Abilities
Expert expertise in SQL (PostgreSQL, SQL Server, MySQL, etc.) and data‑modeling (relational & dimensional).
Proficiency in Python (or another modern language) for ETL, API integration, and automation.
Knowledge of healthcare data standards (Epic Clarity/Caboodle, HL7/FHIR, HPO, etc.) – preferred.
Ability to mentor junior engineers and promote best practices.
Excellent communication & storytelling skills for cross‑functional collaboration.
EEO Statement It is the policy of The University of Texas at Arlington to provide an educational and working environment that provides equal opportunity to all members of the University community. In accordance with federal and state law, the University prohibits unlawful discrimination, including harassment, on the basis of race, color, national origin, religion, age, sex, sexual orientation, pregnancy, disability, genetic information, and or veteran status. The University also prohibits discrimination on the basis of gender identity, and gender expression. Retaliation against persons who oppose a discriminatory practice, file a charge of discrimination, or testify for, assist in, or participate in an investigative proceeding relating to discrimination is prohibited. Constitutionally‑protected expression will not be considered discrimination or harassment under this policy. It is the responsibility of all departments, employees, and students to ensure the University’s compliance with this policy.
#J-18808-Ljbffr