GE Vernova
Senior Data Architect - AI-Powered Data Platforms
GE Vernova, Niskayuna, New York, United States
Senior Data Architect - AI-Powered Data Platforms
We are seeking an experienced Data Architect who specializes in modernizing enterprise data platforms for the AI era. This role requires someone who deeply understands both traditional data architectures and the emerging requirements of AI systems, with expertise in bridging existing data lakes to support modern AI capabilities like Retrieval-Augmented Generation (RAG), vector search, and multi-modal AI applications. You will transform our wealth of structured and unstructured data assets into AI-ready infrastructure.
Responsibilities
Design scalable architectures for processing and indexing unstructured data (PDFs, documents, emails, logs, images) for AI consumption
Architect document processing pipelines that leverage multi-modal LLMs for direct document understanding without traditional OCR preprocessing
Implement intelligent document extraction using LLMs’ vision and context capabilities to handle complex layouts, tables, and mixed media
Design metadata extraction and enrichment pipelines that enhance discoverability of unstructured assets
Build architectures for multi-modal AI applications that combine structured and unstructured data sources
Design end-to-end RAG architectures leveraging existing data lakes and enterprise knowledge bases
Architect hybrid search systems combining traditional keyword search with semantic/vector search capabilities
Implement chunking strategies and embedding pipelines for diverse document types and data sources
Build architectures for continuous learning where RAG systems are updated with new data in near real-time
Design security and access control models across legacy systems and modern AI platforms
Create data governance frameworks that ensure compliance while enabling AI innovation
Optimize storage strategies for cost-effective management of structured and unstructured data
Design tiered storage architectures balancing performance and cost
Implement caching layers for frequently accessed embeddings and AI model inputs
Basic Qualifications
Bachelor's degree in Computer Science, Information Systems, or related field
10+ years of experience as a Data Architect, Data Platform Engineer, or similar role with enterprise data systems
5+ years of experience with both structured (SQL databases, data warehouses) and unstructured data (documents, logs, multimedia)
Understanding of modern document processing using multi-modal LLMs and traditional extraction methods
Proficiency in Python and SQL, with experience in data processing libraries
Legal authorization to work in the U.S. is required. We will not sponsor individuals at the Bachelor’s level for employment visas, now or in the future, for this job opening.
Must be 18 years or older
You must submit your application for employment on the careers page at www.careers.gevernova.com to be considered.
Preferred Qualifications
12+ years of experience modernizing legacy data architectures for cloud and AI workloads
Deep expertise in unstructured data processing using both multi-modal LLMs and traditional methods
Experience with multi-modal LLMs for document understanding and their cost/performance trade-offs
Background in information retrieval, search engineering, or content management systems
Experience with multi-modal AI architectures combining text, image, and structured data
Master’s degree in Computer Science, Information Systems, or related field
Technical Stack Document Processing: Multi-modal LLMs (GPT-4V, Claude Vision, Gemini), LlamaParse, Unstructured.io, Azure Document Intelligence, AWS Textract (for legacy/high-volume), direct PDF-to-context pipelines
Vector/Search: Pinecone, Weaviate, pgvector
Lake Technologies: AWS S3, Azure ADLS
Languages: Python, SQL, Scala, Java
APIs: OpenAI, Anthropic, Google Vertex AI, AWS Bedrock, Azure OpenAI
Compensation & Benefits The salary range for this position is $145,000 - $242,000 USD annually. The specific salary offered may be influenced by experience, education, and work location. This position is eligible for a performance bonus and will remain posted until at least October 5, 2025.
GE provides a comprehensive benefits package including health care Coverage, retirement plan with 401K matching, life insurance, disability coverage, paid time off, EAP, and more. GE Vernova is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, national or ethnic origin, sex, sexual orientation, gender identity or expression, age, disability, protected veteran status or other characteristics protected by law. Relocation assistance provided: Yes.
Company & Role Notes As a GE Vernova accelerator, GE Vernova Advanced Research drives strategy and leads R&D to power the energy transition. Our researchers collaborate with GE Vernova’s businesses, the U.S. government, and 420+ entities on 150+ energy-focused projects.
Location: Albany, NY
Relocation assistance provided: Yes
IsExpired: false
#J-18808-Ljbffr
Responsibilities
Design scalable architectures for processing and indexing unstructured data (PDFs, documents, emails, logs, images) for AI consumption
Architect document processing pipelines that leverage multi-modal LLMs for direct document understanding without traditional OCR preprocessing
Implement intelligent document extraction using LLMs’ vision and context capabilities to handle complex layouts, tables, and mixed media
Design metadata extraction and enrichment pipelines that enhance discoverability of unstructured assets
Build architectures for multi-modal AI applications that combine structured and unstructured data sources
Design end-to-end RAG architectures leveraging existing data lakes and enterprise knowledge bases
Architect hybrid search systems combining traditional keyword search with semantic/vector search capabilities
Implement chunking strategies and embedding pipelines for diverse document types and data sources
Build architectures for continuous learning where RAG systems are updated with new data in near real-time
Design security and access control models across legacy systems and modern AI platforms
Create data governance frameworks that ensure compliance while enabling AI innovation
Optimize storage strategies for cost-effective management of structured and unstructured data
Design tiered storage architectures balancing performance and cost
Implement caching layers for frequently accessed embeddings and AI model inputs
Basic Qualifications
Bachelor's degree in Computer Science, Information Systems, or related field
10+ years of experience as a Data Architect, Data Platform Engineer, or similar role with enterprise data systems
5+ years of experience with both structured (SQL databases, data warehouses) and unstructured data (documents, logs, multimedia)
Understanding of modern document processing using multi-modal LLMs and traditional extraction methods
Proficiency in Python and SQL, with experience in data processing libraries
Legal authorization to work in the U.S. is required. We will not sponsor individuals at the Bachelor’s level for employment visas, now or in the future, for this job opening.
Must be 18 years or older
You must submit your application for employment on the careers page at www.careers.gevernova.com to be considered.
Preferred Qualifications
12+ years of experience modernizing legacy data architectures for cloud and AI workloads
Deep expertise in unstructured data processing using both multi-modal LLMs and traditional methods
Experience with multi-modal LLMs for document understanding and their cost/performance trade-offs
Background in information retrieval, search engineering, or content management systems
Experience with multi-modal AI architectures combining text, image, and structured data
Master’s degree in Computer Science, Information Systems, or related field
Technical Stack Document Processing: Multi-modal LLMs (GPT-4V, Claude Vision, Gemini), LlamaParse, Unstructured.io, Azure Document Intelligence, AWS Textract (for legacy/high-volume), direct PDF-to-context pipelines
Vector/Search: Pinecone, Weaviate, pgvector
Lake Technologies: AWS S3, Azure ADLS
Languages: Python, SQL, Scala, Java
APIs: OpenAI, Anthropic, Google Vertex AI, AWS Bedrock, Azure OpenAI
Compensation & Benefits The salary range for this position is $145,000 - $242,000 USD annually. The specific salary offered may be influenced by experience, education, and work location. This position is eligible for a performance bonus and will remain posted until at least October 5, 2025.
GE provides a comprehensive benefits package including health care Coverage, retirement plan with 401K matching, life insurance, disability coverage, paid time off, EAP, and more. GE Vernova is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, national or ethnic origin, sex, sexual orientation, gender identity or expression, age, disability, protected veteran status or other characteristics protected by law. Relocation assistance provided: Yes.
Company & Role Notes As a GE Vernova accelerator, GE Vernova Advanced Research drives strategy and leads R&D to power the energy transition. Our researchers collaborate with GE Vernova’s businesses, the U.S. government, and 420+ entities on 150+ energy-focused projects.
Location: Albany, NY
Relocation assistance provided: Yes
IsExpired: false
#J-18808-Ljbffr