LLM Knowledge Graph

This project is still in-progress, so stay tuned for more updates.

What is this project?

The LLM Knowledge Graph tool can be used to extract key entities and relationships from long text documents. It creates detailed knowledge graphs from text, and the GraphReader agent facilitates RAG with high accuracy for complex queries.

Graph Builder

The Graph Builder component processes long-form text such as Wikipedia articles, and extracts key entities, relationships, and atomic facts to construct a knowledge graph. This graph is stored in a Neo4j database for efficient querying and visualization.

Technologies:

Large Language Model: Utilizes meta-llama/Llama-3.3-70B-Instruct to generate XML-based responses for structured extraction of entities and relationships.
Embeddings: Key entities are embedded using sentence-transformers/all-MiniLM-L6-v2, enabling similarity-based retrieval for the graph reader agent.
Python Libraries: Built with LangChain to facilitate AI interactions and manage chat history.
Database: Neo4j stores all extracted nodes, edges, and relationships.
Prompt Logging: Prompts and LLM outputs are versioned and timestamped for debugging and iterative improvement.

Sample Output:

A large selection of key entities and relationships created from the Harry Potter article on Wikipedia. Large Graph Section

A small selection showing the graph hierarchy with selected facts from Harry Potter and the Prisoner of Azkaban.

Graph Reader

The Graph Reader is a querying agent designed to form a rational plan, traverse the knowledge graph, and form answers to complex, multi-hop questions. By leveraging embeddings and graph-based reasoning, it creates responses that traditional RAG approaches struggle with.

Technologies:

Large Language Model: Utilizes openai/GPT-4o with structured output and tool calling to power the graph reader agent.
Python Libraries: Built with LangChain and LangGraph to manage agent state and action selection.
Vector Search and Graph Traversal: Matches embeddings to find relevant entities and context. If the answer is not found, neighbor nodes are explored in Neo4j to form multi-hop responses.

Sample Output:

----- Initial -----
Question: A popular novel was compared by the Sunday Times to works by Roald Dahls. How many copies of this novel have been sold?

----- Step: 1 -----
Rational Plan: To answer the question about how many copies of the novel have been sold, we need to:
1. Identify the novel that was compared by the Sunday Times to works by Roald Dahl.
2. Find specific sales figures or estimates for this novel, which might be mentioned in the article.
3. Provide the number of copies sold based on the information available in the article.

----- Step: 2 -----
Check Atomic Facts Queue: ['the sunday times', 'roald dahl', 'book sales', '8.3 million copies', '10.8 million copies']
>>> returned atomic facts shown in graph screenshot.

----- Step: 3 -----
Read Chunk: ['d112ff4558f29fcb3077101f5ada5229']
>>> returned a chunk of text (identified by hash) from the original article related to the above facts.

----- Step: 4 -----
Next Action: Termination
Rational for Next Action: The current chunk provides the sales figures for Harry Potter and the Philosopher's Stone, which is the novel compared to Roald Dahl's works by the Sunday Times. Therefore, the information needed to answer the question is complete.

----- Step: 5 -----
Analysis: The notebook entry provides a direct answer to the question. It states that the novel Harry Potter and the Philosopher's Stone was compared by the Sunday Times to works by Roald Dahl and that it has sold 120 million copies. There is no conflicting information or need for further analysis as the notebook content is clear and directly answers the question.
Answer: The novel has sold 120 million copies.

----- Return -----
Answer: The novel has sold 120 million copies.

I cleaned up this output for readability; my logging is not nearly this easy to read yet! Harry Potter graph segment

A small section of relevant nodes from the Harry Potter knowledge graph used to solve the above prompt.

References & Further Reading

This project is heavily inspired by the following research:

GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models by Li et al.
Implementing GraphReader with Neo4j and LangGraph by Tomaz Bratanic

View the Code