Logo
Fabrion

Data Engineer (Founding Team)

Fabrion, San Francisco, California, United States, 94199

Save Job

Data/ETL Engineer (Founding Team) Location:

San Francisco Bay Area

Type:

Full-Time

Compensation:

Competitive salary + early-stage equity

Backed by 8VC, we're building a world-class team to tackle one of the industry’s most critical infrastructure problems.

About the Role We’re building a multi-tenant, AI-native platform where enterprise data becomes actionable through semantic enrichment, intelligent agents, and governed interoperability. At the heart of this architecture lies our

Data Fabric

— an intelligent, governed layer that turns fragmented and siloed data into a connected ontology ready for model training, vector search, and insight-to-action workflows.

We\u2019re looking for engineers who enjoy

hard data problems at scale : messy unstructured data, schema drift, multi-source joins, security models, and AI-ready semantic enrichment. You’ll build the backend systems, data pipelines, connector frameworks, and graph-based knowledge models that fuel agentic applications.

If you\u2019ve worked on streaming unstructured pipelines, built connectors into ugly legacy systems, or mapped knowledge graphs that scale — this role will feel like home.

Responsibilities

Build highly reliable, scalable data ingestion and transformation pipelines across structured, semi-structured, and unstructured data sources

Develop and maintain a connector framework for ingesting from enterprise systems (ERPs, PLMs, CRMs, legacy data stores, email, Excel, docs, etc.)

Design and maintain the data fabric layer — including a knowledge graph (Neo4j or Puppygraph) enriched with ontologies, metadata, and relationships

Normalize and vectorize data for downstream AI/LLM workflows — enabling retrieval-augmented generation (RAG), summarization, and alerting

Create and manage data contracts, access layers, lineage, and governance mechanisms

Build and expose secure APIs for downstream services, agents, and users to query enriched semantic data

Collaborate with ML/LLM teams to feed high-quality enterprise data into model training and tuning pipelines

What We’re Looking For Core Experience:

5+ years building large-scale data infrastructure in production environments

Deep experience with ingestion frameworks (Kafka, Airbyte, Meltano, Fivetran) and data pipeline orchestration (Airflow, Dagster, Prefect)

Comfortable processing unstructured data formats: PDFs, Excel, emails, logs, CSVs, web APIs

Experience working with columnar stores, object storage, and lakehouse formats (Iceberg, Delta, Parquet)

Strong background in

knowledge graphs or semantic modeling

(e.g. Neo4j, RDF, Gremlin, Puppygraph)

Familiarity with GraphQL, RESTful APIs, and designing developer-friendly data access layers

Experience implementing

data governance : RBAC, ABAC, data contracts, lineage, data quality checks

Mindset & Culture Fit:

You\u2019re a system thinker: you want to model the real world, not just process it

Comfortable navigating ambiguous data models and building from scratch

Passionate about enabling AI systems with real-world, messy enterprise data

Pragmatic about scalability, observability, and schema evolution

Value autonomy, high trust, and meaningful ownership over infrastructure

Bonus Skills Prior work with

vector DBs

(e.g. Weaviate, Qdrant, Pinecone) and

embedding pipelines

Experience building or contributing to

enterprise connector ecosystems

Knowledge of

ontology versioning ,

graph diffing , or

semantic schema alignment

Familiarity with data fabric patterns (e.g. Palantir Ontology, Linked Data, W3C standards)

Familiar with

fine-tuning LLMs

or enabling RAG pipelines using enterprise knowledge

Experience enforcing data access policy with tools like

OPA ,

Keycloak ,

Snowflake row-level security

Why This Role Matters Agents are only as smart as the data they operate on. This role builds the foundation — the semantic, governed, connected substrate — that makes autonomous decision-making and agent action possible. From factory ERP records to geopolitical news alerts, the data fabric unifies it all.

If you\u2019re excited to tame complexity, unify chaos, and power intelligent systems with trusted data — we’d love to hear from you.

#J-18808-Ljbffr