GraphRAG
Learn how to build and use GraphRAG with R2R
Introduction
GraphRAG is a powerful feature of R2R that allows you to perform graph-based search and retrieval. This guide will walk you through the process of setting it up and running your first queries.
An example knowledge graph constructed from companies in the YC directory.
Note that graph construction may take long for local LLMs, we recommend using cloud LLMs for faster results.
Start server
We provide three configurations for R2R Light, R2R Light with Local LLMs, and R2R Full with Docker+Hatchet. If you want to get started quickly, we recommend using R2R Light. If you want to run large graph workloads, we recommend using R2R Full with Docker+Hatchet.
r2r serve
Ingesting files
We begin the cookbook by ingesting the default sample file aristotle.txt
used across R2R tutorials and cookbooks:
r2r ingest-sample-file
# or
r2r ingest-files /path/to/your/files_or_directory
# Example Response
[{'message': 'Ingestion task queued successfully.', 'task_id': '2b16bb55-4f47-4e66-a6bd-da9e215b9793', 'document_id': '9fbe403b-c11c-5aae-8ade-ef22980c3ad1'}]
The initial ingestion step adds parses the given documents and inserts them into R2R’s relational and vector databases, enabling document management and semantic search over them. The aristotle.txt
example file is typically ingested in under 10s. You can confirm ingestion is complete by querying the documents overview table:
r2r documents-overview
# Example Response
{'id': '9fbe403b-c11c-5aae-8ade-ef22980c3ad1', 'title': 'aristotle.txt', 'user_id': '2acb499e-8428-543b-bd85-0d9098718220', 'type': 'txt', 'created_at': '2024-09-05T18:20:47.921933Z', 'updated_at': '2024-09-05T18:20:47.921938Z', 'ingestion_status': 'success', 'restructuring_status': 'pending', 'version': 'v0', 'collection_ids': [], 'metadata': {'version': 'v0'}}
When ingestion completes successfully for a given file we will find that ingestion_status
reads success
in the corresponding output. You can also view in R2R’s dashboard on http://localhost:7273 that the file has been ingested.
Create Knowledge Graph
Knowledge graph creation is done in two steps:
create-graph
: Extracts nodes and relationships from your input document collection.enrich-graph
: Enhances the graph structure through clustering and explaining entities (commonly referred to asGraphRAG
).
# Cost Estimation step.
# collection ID is optional. If you don't specify one, the default collection will be used.
r2r create-graph --collection-id=122fdf6a-e116-546b-a8f6-e4cb2e2c0a09
This will run a cost estimation step to give you an estimate of the cost of the graph creation process.
# Example Response
Time taken: 0.21 seconds
{
"results": {
"message": "Ran Graph Creation Estimate (not the actual run). Note that these are estimated ranges, actual values may vary. To run the KG creation process, run `create-graph` with `--run` in the cli, or `run_type=\"run\"` in the client.",
"document_count": 2,
"number_of_jobs_created": 3,
"total_chunks": 29,
"estimated_entities": "290 - 580",
"estimated_triples": "362 - 870",
"estimated_llm_calls": "348 - 638",
"estimated_total_in_out_tokens_in_millions": "0 - 1",
"estimated_total_time_in_minutes": "Depends on your API key tier. Accurate estimate coming soon. Rough estimate: 0.0 - 0.17",
"estimated_cost_in_usd": "0.0 - 0.06"
}
}
# Then, you can run the graph creation process with:
r2r create-graph --collection-id=<optional> --run
# Example response for R2R Light
[{'message': 'Graph created successfully, please run enrich-graph to enrich the graph for GraphRAG.'}]
# Example Response for R2R Full. This call is non-blocking and returns immediately. We can check the status using the hatchet dashboard on http://localhost:7274. Details below:
[{'message': 'Graph creation task queued successfully.', 'task_id': 'd9dae1bb-5862-4a16-abaf-5297024df390'}]
If you are using R2R Full, you can log into the hatchet dashboard on http://localhost:7274 ([email protected] / Admin123!!) to check the status of the graph creation process. Please make sure all the kg-extract-*
tasks are completed before running the enrich-graph step.
This step will create a knowledge graph with nodes and relationships. You can get the entities and relationships in the graph using our dashboard on http://localhost:7273 or by calling the following API endpoints:
Graph Enrichment
Now we have a searchable graph, but this graph is not enriched yet. We need to perform the graph enrichment step.
The graph enrichment step performs hierarchical leiden clustering to create communities, and embeds the descriptions. These embeddings will be used later in the local search stage of the pipeline. If you are more interested in the algorithm, please refer to the blog post here.
# collection ID is optional. If you don't specify one, the default collection will be used.
r2r enrich-graph --collection-id=122fdf6a-e116-546b-a8f6-e4cb2e2c0a09
# Similar to the graph creation step, this will run a cost estimation step to give you an estimate of the cost of the graph enrichment process.
Time taken: 0.22 seconds
{
"results": {
"total_entities": 269,
"total_triples": 345,
"estimated_llm_calls": "26 - 53",
"estimated_total_in_out_tokens_in_millions": "0.05 - 0.11",
"estimated_total_time_in_minutes": "Depends on your API key tier. Accurate estimate coming soon. Rough estimate: 0.01 - 0.02",
"estimated_cost_in_usd": "0.0 - 0.01"
}
}
# Now, you can run the graph enrichment process with:
r2r enrich-graph --collection-id=122fdf6a-e116-546b-a8f6-e4cb2e2c0a09 --run
# Example Response with R2R Light
[{'message': 'Graph enriched successfully.'}]
# Example Response with R2R Full, you can check the status using the hatchet dashboard on http://localhost:7274.
[{'message': 'Graph enrichment task queued successfully.', 'task_id': 'd9dae1bb-5862-4a16-abaf-5297024df390'}]
If you’re using R2R Full, you can similarly check that all community-summary-*
tasks are completed before proceeding.
Now you can see that the graph is enriched with the following information. We have added descriptions and embeddings to the nodes and relationships. Also, each node is mapped to a community. Following is a visualization of the enriched graph (deprecated as of now. We are working on a new visualization tool):
You can see the list of communities in the graph using the following API endpoint:
- Communities: Communities
Search
A knowledge graph search performs similarity search on the entity and community description embeddings.
r2r search --query="Who is Aristotle?" --use-kg-search
# The answer will be returned in JSON format and contains results from entities, relationships and communities. Following is a snippet of the output:
Vector search results:
[
{
'fragment_id': 'ecc754cd-380d-585f-84ac-021542ef3c1d',
'extraction_id': '92d78034-8447-5046-bf4d-e019932fbc20',
'document_id': '9fbe403b-c11c-5aae-8ade-ef22980c3ad1',
'user_id': '2acb499e-8428-543b-bd85-0d9098718220',
'collection_ids': [],
'score': 0.7393344796100582,
'text': 'Aristotle[A] (Greek: Ἀριστοτέλης Aristotélēs, pronounced [aristotélɛːs]; 384–322 BC) was an Ancient Greek philosopher and polymath. His writings cover a broad range of subjects spanning the natural sciences, philosophy, linguistics, economics, politics, psychology, and the arts. As the founder of the Peripatetic school of philosophy in the Lyceum in Athens, he began the wider Aristotelian tradition that followed, which set the groundwork for the development of modern science.\n\nLittle is known about Aristotle's life. He was born in the city of Stagira in northern Greece during the Classical period. His father, Nicomachus, died when Aristotle was a child, and he was brought up by a guardian. At 17 or 18, he joined Plato's Academy in Athens and remained there until the age of 37 (c.\u2009347 BC). Shortly after Plato died, Aristotle left Athens and, at the request of Philip II of Macedon, tutored his son Alexander the Great beginning in 343 BC. He established a library in the Lyceum, which helped him to produce many of his hundreds of books on papyrus scrolls.\n\nThough Aristotle wrote many elegant treatises and dia ...",
'metadata': {'title': 'aristotle.txt', 'version': 'v0', 'file_name': 'tmpm3ceiqs__aristotle.txt', 'chunk_order': 0, 'document_type': 'txt', 'size_in_bytes': 73353, 'unstructured_filetype': 'text/plain', 'unstructured_languages': ['eng'], 'partitioned_by_unstructured': True, 'associatedQuery': 'Who is Aristotle?'}}
}, ...
]
KG search results:
{
'local_result': {
'query': 'Who is Aristotle?',
'entities': {'0': {'name': 'Aristotle', 'description': 'Aristotle was an ancient Greek philosopher and polymath, recognized as the father of various fields including logic, biology, and political science. He authored significant works such as the *Nicomachean Ethics* and *Politics*, where he explored concepts of virtue, governance, and the nature of reality, while also critiquing Platos ideas. His teachings and observations laid the groundwork for numerous disciplines, influencing thinkers ...'}},
'relationships': {},
'communities': {'0': {'summary': '```json\n{\n "title": "Aristotle and His Contributions",\n "summary": "The community revolves around Aristotle, an ancient Greek philosopher and polymath, who made significant contributions to various fields including logic, biology, political science, and economics. His works, such as 'Politics' and 'Nicomachean Ethics', have influenced numerous disciplines and thinkers from antiquity through the Middle Ages and beyond. The relationships between his various works and the fields he contributed to highlight his profound impact on Western thought.",\n "rating": 9.5,\n "rating_explanation": "The impact severity rating is high due to Aristotle's foundational influence on multiple disciplines and his enduring legacy in Western philosophy and science.",\n "findings": [\n {\n "summary": "Aristotle's Foundational Role in Logic",\n "explanation": "Aristotle is credited with the earliest study of formal logic, and his conception of it was the dominant form of Western logic until the 19th-century advances in mathematical logic. His works compiled into a set of six bo ...}}}}
},
'global_result': None
}
Time taken: 2.39 seconds
Conclusion
In conclusion, integrating R2R with GraphRAG significantly enhances the capabilities of your RAG applications. By leveraging the power of graph-based knowledge representations, GraphRAG allows for more nuanced and context-aware information retrieval. This is evident in the example query we ran using R2R, which not only retrieved relevant information but also provided a structured analysis of the key contributions of Aristotle to modern society.
In essence, combining R2R with GraphRAG empowers your RAG applications to deliver more intelligent, context-aware, and insightful responses, making it a powerful tool for advanced information retrieval and analysis tasks.
Feel free to reach out to us at [email protected] if you have any questions or need further assistance.
Was this page helpful?