Logo
GlaxoSmithKline

Data Architect II

GlaxoSmithKline, San Francisco, California, United States, 94199

Save Job

Responsibilities

Partner with the Scientific Knowledge Engineering team to develop physical data models to build fit-for-purpose data products

Design data architecture aligned with enterprise-wide standards to promote interoperability

Collaborate with the platform teams and data engineers to maintain architecture principles, standards, and guidelines

Design data foundations that support GenAI workflows including RAG (Retrieval-Augmented Generation), vector databases, and embedding pipelines

Work across business areas and stakeholders to ensure consistent implementation of architecture standards

Lead reviews and maintain architecture documentation and best practices for Onyx and our stakeholders

Adopt security-first design with robust authentication and resilient connectivity

Provide best practices and leadership, subject matter, and GSK expertise to architecture and engineering teams composed of GSK FTEs, strategic partners, and software vendors

Basic Qualifications

Bachelor's degree in computer science, engineering, Data Science or similar discipline

5+ years of experience in data architecture, data engineering, or related fields in pharma, healthcare, or life sciences R&D

3+ years of experience defining architecture standards, patterns on Big Data platforms

3+ years of experience with data warehouse, data lake, and enterprise big data platforms

3+ years of experience with enterprise cloud data architecture (preferably Azure or GCP) and delivering solutions at scale

3+ years of hands-on relational, dimensional, and/or analytic experience (using RDBMS, dimensional, NoSQL data platform technologies, and ETL and data ingestion protocols)

Preferred Qualifications

Master's or PhD in computer science, engineering, Data Science or similar discipline

Deep knowledge and use of at least one common programming language: e.g., Python, Scala, Java

Experience with AI/ML data workflows: feature stores, vector databases, embedding pipelines, model serving architectures

Familiarity with GenAI/LLM data patterns: RAG architectures, prompt engineering data requirements, fine-tuning data preparation

Experience with GCP data/analytics stack: Spark, Dataflow, Dataproc, GCS, Bigquery

Experience with enterprise data tools: Ataccama, Collibra, Acryl

Experience with Agile frameworks: SAFe, Jira, Confluence, Azure DevOps

Experience applying CI/CD principles to data solution

Experience with Spark and RAG-based architectures for data science and ML use cases

Strong communication skills-ability to explain technical concepts to non-technical stakeholders

Pharmaceutical, healthcare, or life sciences background

Salary ranges: $109,725 to $182,875 (annual base salary for new hires in this position).

GSK is an Equal Opportunity Employer. All qualified applicants will receive equal consideration for employment without regard to race, color, religion, sex (including pregnancy, gender identity, sexual orientation), parental status, national origin, age, disability, genetic information, military service or any basis prohibited under federal, state or local law.

#J-18808-Ljbffr