GraphRAG in R2R

Advanced knowledge graph retrieval and generation

Overview

GraphRAG extends traditional RAG by leveraging community detection and summarization within knowledge graphs. This approach provides richer context and more comprehensive answers by understanding how information is clustered and connected across your documents.

Architecture

Understanding Communities

Communities are automatically detected clusters of related information in your knowledge graph. They provide:

  1. Higher-level understanding of document themes
  2. Summarized context for related concepts
  3. Improved retrieval through topic-based organization

Example communities across different domains:

DomainCommunity Examples
Scientific PapersResearch methods, theories, research teams
News ArticlesWorld events, industry sectors, key figures
Technical DocsSystem components, APIs, user workflows
Legal DocumentsCase types, jurisdictions, legal principles

Implementation Guide

1. Prerequisites

Ensure you have:

  • Documents ingested into a collection
  • Entities and relationships extracted
  • Graph synchronized
1from r2r import R2RClient
2
3client = R2RClient("http://localhost:7272")
4
5# Setup collection and extract knowledge
6collection_id = "your-collection-id"
7client.collections.extract(collection_id)
8client.graphs.pull(collection_id)
If no collection was specified when ingesting a document, R2R will assign it to the user’s default collection. You can see your available collections with the Python SDK by calling client.collections.list(). Refer to the collections cookbook for a deep dive.

2. Building Communities

1# Build communities for your collection's graph
2build_response = client.graphs.build(collection_id)

The build process:

  1. Analyzes graph connectivity
  2. Identifies dense subgraphs
  3. Generates community summaries
  4. Creates findings and insights

3. Using GraphRAG

Once communities are built, they’re automatically integrated into search and RAG:

1# Search across all levels
2search_response = client.retrieval.search(
3 "What are the key theories?",
4 search_settings={
5 "graph_settings": {
6 "enabled": True,
7 }
8 }
9)
10
11# RAG with community context
12rag_response = client.retrieval.rag(
13 "Explain the relationships between theories",
14 graph_search_settings={
15 "enabled": True
16 }
17)

Understanding Results

GraphRAG returns three types of results:

1. Document Chunks

1{
2 "chunk_id": "70c96e8f-e5d3-5912-b79b-13c5793f17b5",
3 "text": "Example document text...",
4 "score": 0.78,
5 "metadata": {
6 "document_type": "txt",
7 "associated_query": "query text"
8 }
9}

2. Graph Elements

1{
2 "content": {
3 "name": "CONCEPT_NAME",
4 "description": "Entity description...",
5 },
6 "result_type": "entity",
7 "score": 0.74
8}

3. Communities

1{
2 "content": {
3 "name": "Community Name",
4 "summary": "High-level community description...",
5 "findings": [
6 "Key insight 1 with supporting evidence...",
7 "Key insight 2 with supporting evidence..."
8 ],
9 "rating": 9.0,
10 "rating_explanation": "Explanation of importance..."
11 },
12 "result_type": "community",
13 "score": 0.57
14}

Scaling GraphRAG

Using Orchestration

For large collections, use R2R’s orchestration capabilities:

  1. Access Hatchet UI at http://localhost:7274

  2. Monitor:

    • Document extraction progress
    • Community detection status
    • Error handling
    • Workflow retries
Monitoring GraphRAG workflows in Hatchet

Best Practices

  1. Development

    • Start with small document sets
    • Test with single documents first
    • Scale gradually to larger collections
  2. Performance

    • Monitor community size and complexity
    • Use pagination for large result sets
    • Consider breaking very large collections
  3. Quality

    • Review community summaries
    • Validate findings accuracy
    • Monitor retrieval relevance

Troubleshooting

Common issues and solutions:

  1. Poor Community Quality

    • Check entity extraction quality
    • Review relationship connections
    • Consider adjusting collection scope
  2. Performance Issues

    • Monitor graph size
    • Check community complexity
    • Use orchestration for large graphs
  3. Integration Problems

    • Verify extraction completion
    • Check collection synchronization
    • Review API configurations

Next Steps