Logo
GSK

Data Architect II

GSK, Cambridge, Massachusetts, us, 02140

Save Job

Data Architect II Location: GSK, Cambridge MA, USA (and other locations)

Join to apply for this role at GSK. This position supports the research data ecosystem and enables scientists to accelerate medical discovery through modern data architecture.

Overview The Onyx Research Data Tech organization is a full‑stack shop that powers data and analytics at scale, partnering with scientists to deliver tailored solutions.

Onyx focuses on:

Building a metadata‑enabled data experience for scientists, engineers, and decision‑makers

Providing AI/ML and data analysis environments to accelerate predictive capabilities

Engineering data at scale as a unified asset to unlock real‑time value

Responsibilities

Partner with Scientific Knowledge Engineering to develop physical data models for fit‑for‑purpose products

Design data architecture aligned with enterprise standards to promote interoperability

Collaborate with platform teams and data engineers to maintain architecture principles, standards, and guidelines

Design foundations that support GenAI workflows, including RAG, vector databases, and embedding pipelines

Work across business areas and stakeholders to ensure consistent implementation of architecture standards

Lead reviews and maintain architecture documentation and best practices for Onyx and stakeholders

Adopt a security‑first design with robust authentication and resilient connectivity

Provide leadership, subject matter expertise, and GSK knowledge to architecture and engineering teams, partners, and vendors

Qualifications Basic

Bachelor’s degree in computer science, engineering, data science, or similar discipline

5+ years of data architecture or engineering in pharma, healthcare, or life sciences R&D

3+ years defining architecture standards on Big Data platforms

3+ years experience with data warehouse, lake, and enterprise big data platforms

3+ years enterprise cloud data architecture (Azure or GCP) at scale

3+ years hands‑on relational, dimensional, and analytic experience with RDBMS, NoSQL, ETL, and ingestion protocols

Preferred

Master’s or PhD in relevant discipline

Deep knowledge of at least one programming language (Python, Scala, Java)

Experience with AI/ML data workflows: feature stores, vector databases, embedding pipelines, model serving architectures

Familiarity with GenAI/LLM patterns: RAG, prompt engineering, data preparation

Experience with GCP data/analytics stack: Spark, Dataflow, Dataproc, GCS, BigQuery

Experience with enterprise data tools: Ataccama, Collibra, Acryl

Experience with Agile frameworks: SAFe, Jira, Confluence, Azure DevOps

Experience applying CI/CD principles to data solutions

Strong communication skills to explain technical concepts to non‑technical stakeholders

Pharmaceutical, healthcare, or life sciences background

Compensation and Benefits

Annual base salary: $109,725‑$182,875 (region dependent)

Annual bonus and long‑term incentive program (share‑based)

Health care and other insurance benefits for employee and family

Retirement benefits, paid holidays, vacation, paid caregiver/parental and medical leave

Equal Opportunity GSK is an Equal Opportunity Employer. All qualified applicants will receive equal consideration for employment without regard to race, color, religion, sex (including pregnancy, gender identity, and sexual orientation), parental status, national origin, age, disability, genetic information, military service, or any basis prohibited under federal, state or local law.

#J-18808-Ljbffr