Advanced GraphRAG
Advanced GraphRAG Techniques with R2R
Advanced GraphRAG Techniques
R2R supports advanced GraphRAG techniques that can be easily configured at runtime. This flexibility allows you to experiment with different SoTA strategies and optimize your RAG pipeline for specific use cases.
Advanced GraphRAG techniques are still a beta feature in R2R.There may be limitations in observability and analytics when implementing them.
Are we missing an important technique? If so, then please let us know at [email protected].
Prompt Tuning
One way that we can improve upon GraphRAG’s already impressive capabilities by tuning our prompts to a specific domain. When we create a knowledge graph, an LLM extracts the relationships between entities; but for very targeted domains, a general approach may fall short.
To demonstrate this, we can run GraphRAG over the technical papers for the 2024 Nobel Prizes in chemistry, medicine, and physics. By tuning our prompts for GraphRAG, we attempt to understand our documents at a high level, and provide the LLM with a more pointed description.
The following script, which utilizes the Python SDK, generates the tuned prompts and calls the knowledge graph creation process with these prompts at runtime:
For illustrative purposes, we look can look at the graphrag_entity_description
prompt before and after prompt tuning. It’s clear that with prompt tuning, we are able to capture the intent of the documents, giving us a more targeted prompt overall.
Prompt after Prompt Tuning
Prompt after Prompt Tuning
After prompt tuning, we see an increase in the number of communities—after prompt tuning, these communities appear more focused and domain-specific with clearer thematic boundaries.
Prompt tuning produces:
- More precise community separation: GraphRAG alone produced a single
MicroRNA Research
Community, which GraphRAG with prompt tuning produced communities aroundC. elegans MicroRNA Research
,LET-7 MicroRNA
, andmiRNA-184 and EDICT Syndrome
. - Enhanced domain focus: Previously, we had a single community for
AI Researchers
, but with prompt tuning we create specialized communities such asHinton, Hopfield, and Deep Learning
,Hochreiter and Schmidhuber
, andMinksy and Papert's ANN Critique.
Prompt tuning allow for us to generate communities that better reflect the natural organization of the domain knowledge while maintaining more precise technical and thematic boundaries between related concepts.
Contextual Chunk Enrichment
Contextual chunk enrichment is a technique that allows us to capture the semantic meaning of the entities and relationships in the knowledge graph. This is done by using a combination of the entity’s textual description and its contextual embeddings. This enrichment process enhances the quality and depth of information in your knowledge graph by:
- Analyzing the surrounding context of each entity mention
- Incorporating semantic information from related passages
- Preserving important contextual nuances that might be lost in simple entity extraction
You can learn more about contextual chunk enrichment here.
Entity Deduplication
When creating a knowledge graph across multiple documents, entities are initially created at the document level. This means that the same real-world entity (e.g., “Albert Einstein” or “CRISPR”) might appear multiple times if it’s mentioned in different documents. This duplication can lead to:
- Redundant information in your knowledge graph
- Fragmented relationships across duplicate entities
- Increased storage and processing overhead
- Potentially inconsistent entity descriptions
The deduplicate-entities
endpoint addresses these issues by:
- Identifying similar entities using name (exact match, other strategies coming soon)
- Merging their properties and relationships
- Maintaining the most comprehensive description
- Removing the duplicate entries
CLI
SDK
Monitoring Deduplication
You can monitor the deduplication process in two ways:
-
Hatchet Dashboard: Access the dashboard at http://localhost:7274 to view:
- Task status and progress
- Any errors or warnings
- Completion time estimates
-
API Endpoints: Once deduplication is complete, verify the results using these endpoints with
entity_level = collection
:
Best Practices
When using entity deduplication:
- Run deduplication after initial graph creation but before any enrichment steps
- Monitor the number of entities before and after to ensure expected reduction
- Review a sample of merged entities to verify accuracy
- For large collections, expect the process to take longer and plan accordingly