This project was inspired by recent work done in the area of utilizing knowledge graphs to enhance retrieval.
In this project you'll find an ingestion pipeline that streamline getting data from any type of file containing text and parsing it into a Neo4j knowledge graph E2E.
And there is also the knowledge graph RAG implementation and a fastAPI
application that serves this content. So far we're mounting a chainlit
chat application for testing, but adding chains as individual endpoints is planned.
The ingestion is done in multiple steps:
- Parse file contents using
tika
- Split
Document
using a semantic splitting strategy - Use LLM to create
GraphDocuments
from sequence ofDocuments
- Save
GraphDocuments
to a runningNeo4j
instance - Use the created graph, and the corresponding source document nodes to create vector/keyword indices in the graph.
This RAG setup is based around two retrievers, hybrid
and structured
, we combine the results of these retrievers.
The structured
retriever relies of extracting entities from a given query, we achieve this via a specialized NER
chain that return a list
of names. With this list of names we execute cypher that returns nodes
and relationships
in a way easily interpreted by LLMs
.
The hybrid
retriever is a whole lot simpler, when creating the graph database we lay out the groundwork needed to also create vector embeddings and keyword index. Embeddings are stored on source document nodes which we then can search by cosine similarity.
One of many inspiring blog posts at Neo4j developer blogs
From Local to Global: A Graph RAG Approach to Query-Focused Summarization