GraphRAG in R2R — The most advanced AI retrieval system. Containerized, Retrieval-Augmented Generation (RAG) with a RESTful API.

Overview

GraphRAG extends traditional RAG by leveraging community detection and summarization within knowledge graphs. This approach provides richer context and more comprehensive answers by understanding how information is clustered and connected across your documents.

Architecture

Understanding Communities

Communities are automatically detected clusters of related information in your knowledge graph. They provide:

Higher-level understanding of document themes
Summarized context for related concepts
Improved retrieval through topic-based organization

Example communities across different domains:

Domain	Community Examples
Scientific Papers	Research methods, theories, research teams
News Articles	World events, industry sectors, key figures
Technical Docs	System components, APIs, user workflows
Legal Documents	Case types, jurisdictions, legal principles

Implementation Guide

1. Prerequisites

Ensure you have:

Documents ingested into a collection
Entities and relationships extracted
Graph synchronized

1 from r2r import R2RClient
2 
3 client = R2RClient("http://localhost:7272")
4 
5 # Setup collection and extract knowledge
6 collection_id = "your-collection-id"
7 client.collections.extract(collection_id)
8 client.graphs.pull(collection_id)

If no collection was specified when ingesting a document, R2R will assign it to the user’s default collection. You can see your available collections with the Python SDK by calling client.collections.list(). Refer to the collections cookbook for a deep dive.

2. Building Communities

Collections use either user-provided or automatically generated descriptions (derived from document summaries) to establish context for community creation.

1 # Synthetically generate a description for the collection
2 client.collections.update(
3     collection_id,
4     generate_description=True
5 )

1 # Build communities for your collection's graph
2 build_response = client.graphs.build(collection_id)

The build process:

Analyzes graph connectivity
Identifies dense subgraphs
Generates community summaries
Creates findings and insights

3. Using GraphRAG

Once communities are built, they’re automatically integrated into search and RAG:

1 # Search across all levels
2 search_response = client.retrieval.search(
3     "What are the key theories?",
4     search_settings={
5         "graph_settings": {
6             "enabled": True,
7         }
8     }
9 )
10 
11 # RAG with community context
12 rag_response = client.retrieval.rag(
13     "Explain the relationships between theories",
14     graph_search_settings={
15         "enabled": True
16     }
17 )

Understanding Results

GraphRAG returns three types of results:

1. Document Chunks

1 {
2     "chunk_id": "70c96e8f-e5d3-5912-b79b-13c5793f17b5",
3     "text": "Example document text...",
4     "score": 0.78,
5     "metadata": {
6         "document_type": "txt",
7         "associated_query": "query text"
8     }
9 }

2. Graph Elements

1 {
2     "content": {
3         "name": "CONCEPT_NAME",
4         "description": "Entity description...",
5     },
6     "result_type": "entity",
7     "score": 0.74
8 }

3. Communities

1 {
2     "content": {
3         "name": "Community Name",
4         "summary": "High-level community description...",
5         "findings": [
6             "Key insight 1 with supporting evidence...",
7             "Key insight 2 with supporting evidence..."
8         ],
9         "rating": 9.0,
10         "rating_explanation": "Explanation of importance..."
11     },
12     "result_type": "community",
13     "score": 0.57
14 }

Scaling GraphRAG

Using Orchestration

For large collections, use R2R’s orchestration capabilities:

Access Hatchet UI at http://localhost:7274
- Login: [email protected]
- Password: Admin123!!
Monitor:
- Document extraction progress
- Community detection status
- Error handling
- Workflow retries

Monitoring GraphRAG workflows in Hatchet

Best Practices

Development
- Start with small document sets
- Test with single documents first
- Scale gradually to larger collections
Performance
- Monitor community size and complexity
- Use pagination for large result sets
- Consider breaking very large collections
Quality
- Review community summaries
- Validate findings accuracy
- Monitor retrieval relevance

Troubleshooting

Common issues and solutions:

Poor Community Quality
- Check entity extraction quality
- Review relationship connections
- Consider adjusting collection scope
Performance Issues
- Monitor graph size
- Check community complexity
- Use orchestration for large graphs
Integration Problems
- Verify extraction completion
- Check collection synchronization
- Review API configurations

Next Steps

Explore hybrid search integration
Learn about collection management
Set up observability

1	from r2r import R2RClient
2
3	client = R2RClient("http://localhost:7272")
4
5	# Setup collection and extract knowledge
6	collection_id = "your-collection-id"
7	client.collections.extract(collection_id)
8	client.graphs.pull(collection_id)

1	# Synthetically generate a description for the collection
2	client.collections.update(
3	collection_id,
4	generate_description=True
5	)

1	# Build communities for your collection's graph
2	build_response = client.graphs.build(collection_id)

1	# Search across all levels
2	search_response = client.retrieval.search(
3	"What are the key theories?",
4	search_settings={
5	"graph_settings": {
6	"enabled": True,
7	}
8	}
9	)
10
11	# RAG with community context
12	rag_response = client.retrieval.rag(
13	"Explain the relationships between theories",
14	graph_search_settings={
15	"enabled": True
16	}
17	)

1	{
2	"chunk_id": "70c96e8f-e5d3-5912-b79b-13c5793f17b5",
3	"text": "Example document text...",
4	"score": 0.78,
5	"metadata": {
6	"document_type": "txt",
7	"associated_query": "query text"
8	}
9	}

1	{
2	"content": {
3	"name": "CONCEPT_NAME",
4	"description": "Entity description...",
5	},
6	"result_type": "entity",
7	"score": 0.74
8	}

1	{
2	"content": {
3	"name": "Community Name",
4	"summary": "High-level community description...",
5	"findings": [
6	"Key insight 1 with supporting evidence...",
7	"Key insight 2 with supporting evidence..."
8	],
9	"rating": 9.0,
10	"rating_explanation": "Explanation of importance..."
11	},
12	"result_type": "community",
13	"score": 0.57
14	}