Advanced GraphRAG

Advanced GraphRAG Techniques with R2R

Advanced GraphRAG Techniques

R2R supports advanced GraphRAG techniques that can be easily configured at runtime. This flexibility allows you to experiment with different SoTA strategies and optimize your RAG pipeline for specific use cases.

Advanced GraphRAG techniques are still a beta feature in R2R.There may be limitations in observability and analytics when implementing them.

Are we missing an important technique? If so, then please let us know at [email protected].

Prompt Tuning

One way that we can improve upon GraphRAG’s already impressive capabilities by tuning our prompts to a specific domain. When we create a knowledge graph, an LLM extracts the relationships between entities; but for very targeted domains, a general approach may fall short.

To demonstrate this, we can run GraphRAG over the technical papers for the 2024 Nobel Prizes in chemistry, medicine, and physics. By tuning our prompts for GraphRAG, we attempt to understand our documents at a high level, and provide the LLM with a more pointed description.

The following script, which utilizes the Python SDK, generates the tuned prompts and calls the knowledge graph creation process with these prompts at runtime:

1# Step 1: Tune the prompts for knowledge graph creation
2# Tune the entity description prompt
3entity_prompt_response = client.get_tuned_prompt(
4 prompt_name="graphrag_entity_description"
5)
6tuned_entity_prompt = entity_prompt_response['results']['tuned_prompt']
7
8# Tune the triples extraction prompt
9triples_prompt_response = client.get_tuned_prompt(
10 prompt_name="graphrag_triples_extraction_few_shot"
11)
12tuned_triples_prompt = triples_prompt_response['results']['tuned_prompt']
13
14# Step 2: Create the knowledge graph
15kg_settings = {
16 "kg_entity_description_prompt": tuned_entity_prompt
17}
18
19# Generate the initial graph
20graph_response = client.create_graph(
21 run_type="run",
22 kg_creation_settings=kg_settings
23)
24
25# Step 3: Clean up the graph by removing duplicate entities
26client.deduplicate_entities(
27 run_type="run",
28 collection_id='122fdf6a-e116-546b-a8f6-e4cb2e2c0a09'
29)
30
31# Step 4: Tune and apply community reports prompt for graph enrichment
32community_prompt_response = client.get_tuned_prompt(
33 prompt_name="graphrag_community_reports"
34)
35tuned_community_prompt = community_prompt_response['results']['tuned_prompt']
36
37# Configure enrichment settings
38kg_enrichment_settings = {
39 "community_reports_prompt": tuned_community_prompt
40}
41
42# Enrich the graph with additional information
43client.enrich_graph(
44 run_type="run",
45 kg_enrichment_settings=kg_enrichment_settings
46)

For illustrative purposes, we look can look at the graphrag_entity_description prompt before and after prompt tuning. It’s clear that with prompt tuning, we are able to capture the intent of the documents, giving us a more targeted prompt overall.

1Provide a comprehensive yet concise summary of the given entity, incorporating its description and associated triples:
2
3Entity Info:
4{entity_info}
5Triples:
6{triples_txt}
7
8Your summary should:
91. Clearly define the entity's core concept or purpose
102. Highlight key relationships or attributes from the triples
113. Integrate any relevant information from the existing description
124. Maintain a neutral, factual tone
135. Be approximately 2-3 sentences long
14
15Ensure the summary is coherent, informative, and captures the essence of the entity within the context of the provided information.

After prompt tuning, we see an increase in the number of communities—after prompt tuning, these communities appear more focused and domain-specific with clearer thematic boundaries.

Prompt tuning produces:

  • More precise community separation: GraphRAG alone produced a single MicroRNA Research Community, which GraphRAG with prompt tuning produced communities around C. elegans MicroRNA Research, LET-7 MicroRNA, and miRNA-184 and EDICT Syndrome.
  • Enhanced domain focus: Previously, we had a single community for AI Researchers, but with prompt tuning we create specialized communities such as Hinton, Hopfield, and Deep Learning, Hochreiter and Schmidhuber, and Minksy and Papert's ANN Critique.
CountGraphRAGGraphRAG with Prompt Tuning
Entities661636
Triples509503
Communities2941

Prompt tuning allow for us to generate communities that better reflect the natural organization of the domain knowledge while maintaining more precise technical and thematic boundaries between related concepts.

Contextual Chunk Enrichment

Contextual chunk enrichment is a technique that allows us to capture the semantic meaning of the entities and relationships in the knowledge graph. This is done by using a combination of the entity’s textual description and its contextual embeddings. This enrichment process enhances the quality and depth of information in your knowledge graph by:

  1. Analyzing the surrounding context of each entity mention
  2. Incorporating semantic information from related passages
  3. Preserving important contextual nuances that might be lost in simple entity extraction

You can learn more about contextual chunk enrichment here.

Entity Deduplication

When creating a knowledge graph across multiple documents, entities are initially created at the document level. This means that the same real-world entity (e.g., “Albert Einstein” or “CRISPR”) might appear multiple times if it’s mentioned in different documents. This duplication can lead to:

  • Redundant information in your knowledge graph
  • Fragmented relationships across duplicate entities
  • Increased storage and processing overhead
  • Potentially inconsistent entity descriptions

The deduplicate-entities endpoint addresses these issues by:

  1. Identifying similar entities using name (exact match, other strategies coming soon)
  2. Merging their properties and relationships
  3. Maintaining the most comprehensive description
  4. Removing the duplicate entries
$r2r deduplicate-entities --collection-id=122fdf6a-e116-546b-a8f6-e4cb2e2c0a09 --run
>
># Example Response
>[{'message': 'Deduplication task queued successfully.', 'task_id': 'd9dae1bb-5862-4a16-abaf-5297024df390'}]

Monitoring Deduplication

You can monitor the deduplication process in two ways:

  1. Hatchet Dashboard: Access the dashboard at http://localhost:7274 to view:

    • Task status and progress
    • Any errors or warnings
    • Completion time estimates
  2. API Endpoints: Once deduplication is complete, verify the results using these endpoints with entity_level = collection:

Best Practices

When using entity deduplication:

  • Run deduplication after initial graph creation but before any enrichment steps
  • Monitor the number of entities before and after to ensure expected reduction
  • Review a sample of merged entities to verify accuracy
  • For large collections, expect the process to take longer and plan accordingly