Knowledge Graphs

Building and managing graphs through collections

Overview

R2R allows you to build and analyze knowledge graphs from your documents through a collection-based architecture. The system extracts entities and relationships from documents, enabling richer search capabilities that understand connections between information.

The process works in several key stages:

  • Documents are first ingested and entities/relationships are extracted
  • Collections serve as containers for documents and their corresponding graphs
  • Extracted information is pulled into the collection’s graph
  • Communities can be built to identify higher-level concepts
  • The resulting graph enhances search with relationship-aware queries

Collections in R2R are flexible containers that support multiple documents and provide features for access control and graph management. A document can belong to multiple collections, allowing for different organizational schemes and sharing patterns.

The resulting knowledge graphs improve search accuracy by understanding relationships between concepts rather than just performing traditional document search.

1

Ingestion and Extraction

Before we can extract entities and relationships from a document, we must ingest a file. After we’ve successfully ingested a file, we can extract the entities and relationships from document.

In the following script, we fetch The Gift of the Magi by O. Henry and ingest it our R2R server. We then begin the extraction process, which may take a few minutes to run.

1import requests
2from r2r import R2RClient
3import tempfile
4import os
5
6# Set up the client
7client = R2RClient("http://localhost:7272")
8
9# Fetch the text file
10url = "https://www.gutenberg.org/cache/epub/7256/pg7256.txt"
11response = requests.get(url)
12
13# Create a temporary file
14temp_dir = tempfile.gettempdir()
15temp_file_path = os.path.join(temp_dir, "gift_of_the_magi.txt")
16with open(temp_file_path, 'w') as temp_file:
17 temp_file.write(response.text)
18
19# Ingest the file
20ingest_response = client.documents.create(file_path=temp_file_path)
21document_id = ingest_response["results"]["document_id"]
22
23# Extract entities and relationships
24extract_response = client.documents.extract(document_id)
25
26# View extracted knowledge
27entities = client.documents.list_entities(document_id)
28relationships = client.documents.list_relationships(document_id)
29
30# Clean up the temporary file
31os.unlink(temp_file_path)

As this script runs, we see indications of successful ingestion and extraction.

Successful ingestion and extraction in the R2R dashboard.
Both ingestion and extraction were successful, as seen in the R2R Dashboard
2

Managing Collections

Graphs are built within a collection, allowing for us to add many documents to a graph, and to share our graphs with other users. When we ingested the file above, it was added into our default collection.

Each collection has a description which is used in the graph creation process. This can be set by the user, or generated using an LLM.

1from r2r import R2RClient
2
3# Set up the client
4client = R2RClient("http://localhost:7272")
5
6# Update the description of the default collection
7collection_id = "122fdf6a-e116-546b-a8f6-e4cb2e2c0a09"
8update_result = client.collections.update(
9 id=collection_id,
10 generate_description=True, # LLM generated
11)
The resulting description.
The LLM generated description for our collection
3

Pulling Extractions into the Graph

Our graph will not contain the extractions from our documents until we pull them into the graph. This gives developers more granular control over the creation and management of graphs.

Recall that we already extracted the entities and relationships for the graph; this means that we can pull a document into many graphs without having to rerun the extraction process.

1from r2r import R2RClient
2
3# Set up the client
4client = R2RClient("http://localhost:7272")
5
6# Pull the extractions from all docments into the default collection
7collection_id = "122fdf6a-e116-546b-a8f6-e4cb2e2c0a09"
8client.graphs.pull(
9 collection_id=collection_id
10)

As soon as we pull the extractions into the graph, we can begin using the graph in our searches. We can confirm that the entities and relationships were pulled into the collection, as well.

Successful ingestion and extraction in the R2R dashboard.
Entities are `pulled` in from the document to the collection
4

Building Communities

To further enhance our graph we can build communities, which clusters over the entities and relationships inside our graph. This allows us to capture higher-level concepts that exist within our data.

1from r2r import R2RClient
2
3# Set up the client
4client = R2RClient("http://localhost:7272")
5
6# Build the communities for the default collection
7collection_id = "122fdf6a-e116-546b-a8f6-e4cb2e2c0a09"
8client.graphs.build(
9 collection_id=collection_id
10)

We can see that the resulting communities capture overall themes and concepts within the story.

The communities generated for the collection.
The resulting communities, generated from the clustering process
Built with