GraphRAG

Learn how to build and use GraphRAG with R2R

Introduction

GraphRAG is a powerful feature of R2R that allows you to perform graph-based search and retrieval. This guide will walk you through the process of setting it up and running your first queries.

An example knowledge graph constructed from companies in the YC directory.

Note that graph construction may take long for local LLMs, we recommend using cloud LLMs for faster results.

Start server

We provide three configurations for R2R: Light, Light with Local LLMs, and Full with Docker+Hatchet. If you want to get started quickly, we recommend using R2R Light. If you want to run large graph workloads, we recommend using R2R Full with Docker+Hatchet.

$r2r serve

Ingesting files

We begin the cookbook by ingesting the default sample file aristotle.txt used across R2R tutorials and cookbooks:

$r2r ingest-sample-file
># or
>r2r ingest-files /path/to/your/files_or_directory
>
># Example Response
>[{'message': 'Ingestion task queued successfully.', 'task_id': '2b16bb55-4f47-4e66-a6bd-da9e215b9793', 'document_id': '9fbe403b-c11c-5aae-8ade-ef22980c3ad1'}]

The initial ingestion step adds parses the given documents and inserts them into R2R’s relational and vector databases, enabling document management and semantic search over them. The aristotle.txt example file is typically ingested in under 10s. You can confirm ingestion is complete by querying the documents overview table:

$r2r documents-overview
>
># Example Response
>{'id': '9fbe403b-c11c-5aae-8ade-ef22980c3ad1', 'title': 'aristotle.txt', 'user_id': '2acb499e-8428-543b-bd85-0d9098718220', 'type': 'txt', 'created_at': '2024-09-05T18:20:47.921933Z', 'updated_at': '2024-09-05T18:20:47.921938Z', 'ingestion_status': 'success', 'restructuring_status': 'pending', 'version': 'v0', 'collection_ids': [], 'metadata': {'version': 'v0'}}

When ingestion completes successfully for a given file we will find that ingestion_status reads success in the corresponding output. You can also view in R2R’s dashboard on http://localhost:7273 that the file has been ingested.

Ingested File

Create Knowledge Graph

Knowledge graph creation is done in two steps:

  1. create-graph: Extracts nodes and relationships from your input document collection.
  2. enrich-graph: Enhances the graph structure through clustering and explaining entities (commonly referred to as GraphRAG).
$# Cost Estimation step.
># collection ID is optional. If you don't specify one, the default collection will be used.
>r2r create-graph --collection-id=122fdf6a-e116-546b-a8f6-e4cb2e2c0a09
>
>This will run a cost estimation step to give you an estimate of the cost of the graph creation process.
>
># Example Response
>Time taken: 0.21 seconds
>{
> "results": {
> "message": "Ran Graph Creation Estimate (not the actual run). Note that these are estimated ranges, actual values may vary. To run the KG creation process, run `create-graph` with `--run` in the cli, or `run_type=\"run\"` in the client.",
> "document_count": 2,
> "number_of_jobs_created": 3,
> "total_chunks": 29,
> "estimated_entities": "290 - 580",
> "estimated_triples": "362 - 870",
> "estimated_llm_calls": "348 - 638",
> "estimated_total_in_out_tokens_in_millions": "0 - 1",
> "estimated_total_time_in_minutes": "Depends on your API key tier. Accurate estimate coming soon. Rough estimate: 0.0 - 0.17",
> "estimated_cost_in_usd": "0.0 - 0.06"
> }
>}
>
># Then, you can run the graph creation process with:
>r2r create-graph --collection-id=<optional> --run
>
># Example response for R2R Light
>[{'message': 'Graph created successfully, please run enrich-graph to enrich the graph for GraphRAG.'}]
>
># Example Response for R2R Full. This call is non-blocking and returns immediately. We can check the status using the hatchet dashboard on http://localhost:7274. Details below:
>[{'message': 'Graph creation task queued successfully.', 'task_id': 'd9dae1bb-5862-4a16-abaf-5297024df390'}]

If you are using R2R Full, you can log into the hatchet dashboard on http://localhost:7274 ([email protected] / Admin123!!) to check the status of the graph creation process. Please make sure all the kg-extract-* tasks are completed before running the enrich-graph step.

Hatchet Dashboard

This step will create a knowledge graph with nodes and relationships. You can get the entities and relationships in the graph using our dashboard on http://localhost:7273 or by calling the following API endpoints. These hit the /v2/entities and /v2/triples endpoints respectively. This will by default use the entity_level=document query parameter to get the entities and triples at the document level. We will set the default collection id to 122fdf6a-e116-546b-a8f6-e4cb2e2c0a09 when submitting requests to the endpoints below.

Graph Enrichment

Now we have a searchable graph, but this graph is not enriched yet. It does not have any community level information. We will now run the enrichment step.

The graph enrichment step performs hierarchical leiden clustering to create communities, and embeds the descriptions. These embeddings will be used later in the local search stage of the pipeline. If you are more interested in the algorithm, please refer to the blog post here.

$# collection ID is optional. If you don't specify one, the default collection will be used.
>r2r enrich-graph --collection-id=122fdf6a-e116-546b-a8f6-e4cb2e2c0a09
>
># Similar to the graph creation step, this will run a cost estimation step to give you an estimate of the cost of the graph enrichment process.
>Time taken: 0.22 seconds
>{
> "results": {
> "total_entities": 269,
> "total_triples": 345,
> "estimated_llm_calls": "26 - 53",
> "estimated_total_in_out_tokens_in_millions": "0.05 - 0.11",
> "estimated_total_time_in_minutes": "Depends on your API key tier. Accurate estimate coming soon. Rough estimate: 0.01 - 0.02",
> "estimated_cost_in_usd": "0.0 - 0.01"
> }
>}
>
># Now, you can run the graph enrichment process with:
>r2r enrich-graph --collection-id=122fdf6a-e116-546b-a8f6-e4cb2e2c0a09 --run
>
># Example Response with R2R Light
>[{'message': 'Graph enriched successfully.'}]
>
># Example Response with R2R Full, you can check the status using the hatchet dashboard on http://localhost:7274.
>[{'message': 'Graph enrichment task queued successfully.', 'task_id': 'd9dae1bb-5862-4a16-abaf-5297024df390'}]

If you’re using R2R Full, you can similarly check that all community-summary-* tasks are completed before proceeding.

Now you can see that the graph is enriched with the following information. We have added descriptions and embeddings to the nodes and relationships. Also, each node is mapped to a community. Following is a visualization of the enriched graph (deprecated as of now. We are working on a new visualization tool):

You can see the list of communities in the graph using the following API endpoint:

A knowledge graph search performs similarity search on the entity and community description embeddings.

$r2r search --query="Who is Aristotle?" --use-kg-search
>
># The answer will be returned in JSON format and contains results from entities, relationships and communities. Following is a snippet of the output:
>
>Vector search results:
>[
> {
> 'fragment_id': 'ecc754cd-380d-585f-84ac-021542ef3c1d',
> 'extraction_id': '92d78034-8447-5046-bf4d-e019932fbc20',
> 'document_id': '9fbe403b-c11c-5aae-8ade-ef22980c3ad1',
> 'user_id': '2acb499e-8428-543b-bd85-0d9098718220',
> 'collection_ids': [],
> 'score': 0.7393344796100582,
> 'text': 'Aristotle[A] (Greek: Ἀριστοτέλης Aristotélēs, pronounced [aristotélɛːs]; 384–322 BC) was an Ancient Greek philosopher and polymath. His writings cover a broad range of subjects spanning the natural sciences, philosophy, linguistics, economics, politics, psychology, and the arts. As the founder of the Peripatetic school of philosophy in the Lyceum in Athens, he began the wider Aristotelian tradition that followed, which set the groundwork for the development of modern science.\n\nLittle is known about Aristotle's life. He was born in the city of Stagira in northern Greece during the Classical period. His father, Nicomachus, died when Aristotle was a child, and he was brought up by a guardian. At 17 or 18, he joined Plato's Academy in Athens and remained there until the age of 37 (c.\u2009347 BC). Shortly after Plato died, Aristotle left Athens and, at the request of Philip II of Macedon, tutored his son Alexander the Great beginning in 343 BC. He established a library in the Lyceum, which helped him to produce many of his hundreds of books on papyrus scrolls.\n\nThough Aristotle wrote many elegant treatises and dia ...",
> 'metadata': {'title': 'aristotle.txt', 'version': 'v0', 'file_name': 'tmpm3ceiqs__aristotle.txt', 'chunk_order': 0, 'document_type': 'txt', 'size_in_bytes': 73353, 'unstructured_filetype': 'text/plain', 'unstructured_languages': ['eng'], 'partitioned_by_unstructured': True, 'associatedQuery': 'Who is Aristotle?'}}
> }, ...
>]
>
>KG search results:
>[
> {
> "method": "local",
> "content": {
> "name": "Aristotle",
> "description": "Aristotle was an ancient Greek philosopher and polymath.",
> "metadata": null
> },
> "result_type": "entity",
> "extraction_ids": [
> "56b82827-f6c5-5f52-af98-bde5441c1c0d",
> "1ad47232-453b-5d1a-841c-3912403cbe21",
> "a6d061ee-d5e1-568a-8910-6e86e196b82e",
> "bef9569b-c17b-5bdc-85a1-42b143ba4ceb"
> ],
> "metadata": {
> "associated_query": "Who is Aristotle?"
> }
> },
> {
> "method": "local",
> "content": {
> "name": "Aristotle and His Contributions",
> "summary": "The community revolves around Aristotle, an ancient Greek philosopher and polymath, who made significant contributions to various fields including logic, biology, political science, and economics. His works, such as 'Politics' and 'Nicomachean Ethics', have influenced numerous disciplines and thinkers from antiquity through the Middle Ages and beyond. The relationships between his various works and the fields he contributed to highlight his profound impact on Western thought.",
> "rating": 9.5,
> "rating_explanation": "The impact severity rating is high due to Aristotle's foundational influence on multiple disciplines and his enduring legacy in Western philosophy and science.",
> "findings": [
> " Aristotle is credited with the earliest study of formal logic, and his conception of it was the dominant form of Western logic until the 19th-century advances in mathematical logic. His works compiled into a set of six books, which were later translated into Arabic and Latin, played a crucial role in the development of logic and philosophy. [Data: Entities (15561), Relationships (14388, 14389)]",
> "Aristotle's contributions to biology were foundational, laying the groundwork for the study of living organisms. His works on plants, animals, and human physiology have influenced medical science and continue to be referenced in modern biological research. [Data: Entities (15561), Relationships (14388, 14389)]",
> "Aristotle's political theories, including his concept of the 'mixed constitution' and his ideas on the ideal state, have been influential in shaping political thought and governance. His works on ethics and politics have been studied and debated for centuries, influencing thinkers from antiquity to the present day. [Data: Entities (15561), Relationships (14388, 14389)]",
> "Aristotle's influence on economics is evident in his theories on value, supply, and demand, which laid the groundwork for modern economic thought. His ideas on the 'golden mean' and the concept of the 'invisible hand' have been referenced in numerous economic theories and continue to be studied in modern economic analysis. [Data: Entities (15561), Relationships (14388, 14389)]",
> "Aristotle's teachings on rhetoric and persuasion have been studied and debated for centuries, influencing thinkers from antiquity to the present day. His works on rhetoric and persuasion have been referenced in numerous philosophical and literary texts, highlighting his enduring influence on the study of communication and persuasion. [Data: Entities (15561), Relationships (14388, 14389)]"
> ],
> "metadata": null
> },
> "result_type": "community",
> "extraction_ids": null,
> "metadata": {
> "associated_query": "Who is Aristotle?"
> }
> }
>]
>Time taken: 2.39 seconds

Conclusion

In conclusion, integrating R2R with GraphRAG significantly enhances the capabilities of your RAG applications. By leveraging the power of graph-based knowledge representations, GraphRAG allows for more nuanced and context-aware information retrieval. This is evident in the example query we ran using R2R, which not only retrieved relevant information but also provided a structured analysis of the key contributions of Aristotle to modern society.

In essence, combining R2R with GraphRAG empowers your RAG applications to deliver more intelligent, context-aware, and insightful responses, making it a powerful tool for advanced information retrieval and analysis tasks.

Feel free to reach out to us at [email protected] if you have any questions or need further assistance.

Advanced GraphRAG Techniques

If you want to learn more about the advanced techniques that we use in GraphRAG, please refer to the Advanced GraphRAG Techniques page.