Knowledge Graphs in R2R

Building and managing knowledge graphs through collections

Overview

An example knowledge graph showing extracted entities and relationships

Knowledge graphs in R2R enhance search accuracy and context understanding by extracting and connecting information from your documents. The system uses a two-level architecture:

  1. Document Level: Entities and relationships are first extracted and stored with their source documents
  2. Collection Level: Collections act as soft containers that can include documents and maintain corresponding graphs

System Architecture

Collections provide:

  • Flexible document organization (documents can belong to multiple collections)
  • Access control and sharing
  • Graph synchronization and updates

Getting Started

1. Document-Level Extraction

First, extract entities and relationships from your underlying documents:

1from r2r import R2RClient
2
3client = R2RClient("http://localhost:7272")
4
5# Extract entities and relationships
6document_id = "your-document-id"
7extract_response = client.documents.extract(document_id)
8
9# View extracted knowledge
10entities = client.documents.list_entities(document_id)
11relationships = client.documents.list_relationships(document_id)

2. Creating Collection Graphs

Each collection maintains its own graph. Create and populate a collection:

1# Create collection
2collection = client.collections.create(
3 "Research Papers",
4 "ML research papers with knowledge graph analysis"
5)
6collection_id = collection["results"]["id"]
7
8# Add documents to collection
9client.collections.add_document(collection_id, document_id)
10
11# Synthetically generate a description for the collection
12client.collections.update(
13 collection_id,
14 generate_description=True
15)
16
17# Optional, schedule extraction for all documents in the collection
18# client.graphs.extract(collection_id)
19
20# Pull document knowledge into collection graph
21client.graphs.pull(collection_id)

3. Managing Collection Graphs

View and manage the collection’s knowledge graph:

1# List entities in collection graph
2entities = client.graphs.list_entities(collection_id)
3
4# List relationships in collection graph
5relationships = client.graphs.list_relationships(collection_id)

Example outputs:

1# Entity example
2{
3 "name": "DEEP_LEARNING",
4 "description": "A subset of machine learning using neural networks",
5 "category": "CONCEPT",
6 "id": "ce46e955-ed77-4c17-8169-e878baf3fbb9"
7}
8
9# Relationship example
10{
11 "subject": "DEEP_LEARNING",
12 "predicate": "IS_SUBSET_OF",
13 "object": "MACHINE_LEARNING",
14 "description": "Deep learning is a specialized branch of machine learning"
15}

Graph Synchronization

Understanding how to keep graphs updated:

Document Updates

When documents change:

1# Update document
2client.documents.update(document_id, new_content)
3
4# Re-extract knowledge
5client.documents.extract(document_id)
6
7# Update collection graphs
8client.graphs.pull(collection_id)

Cross-Collection Updates

Documents can belong to multiple collections:

1# Add document to multiple collections
2client.collections.add_document(document_id, collection_id_1)
3client.collections.add_document(document_id, collection_id_2)
4
5# Update all relevant graphs
6client.graphs.pull(collection_id_1)
7client.graphs.pull(collection_id_2)

Access Control

Manage access to graphs through collection permissions:

1# Give user access to collection and its graph
2client.collections.add_user(user_id, collection_id)
3
4# Remove access
5client.collections.remove_user(user_id, collection_id)
6
7# List users with access
8users = client.collections.list_users(collection_id)

Using Knowledge Graphs

Search Integration

Graphs automatically enhance search for collection members:

1# Search with knowledge graph
2results = client.retrieval.search(
3 "What is deep learning?",
4 search_settings={
5 "graph_settings": {"enabled": True}, # Default is True
6 "filters": {"collection_ids": {"$overlap": [collection_id]}}
7 }
8)

RAG Integration

Knowledge graphs enhance RAG responses:

1# RAG with knowledge graph
2response = client.retrieval.rag(
3 "Explain deep learning's relationship to ML",
4 graph_search_settings={
5 "graph_settings": {"enabled": True} # Default is True
6 }
7)

Best Practices

Document Management

  • Extract knowledge after document updates
  • Monitor extraction quality at document level
  • Remember extractions stay with source documents
  • Consider document size and complexity when extracting

Collection Management

  • Keep collections focused on related documents
  • Use meaningful collection names and descriptions
  • Remember documents can belong to multiple collections
  • Pull changes when document extractions update

Performance Optimization

  • Start with small document sets to test extraction
  • Use collection-level operations for bulk processing
  • Monitor graph size and complexity
  • Consider using orchestration for large collections

Access Control

  • Plan collection structure around sharing needs
  • Review access permissions regularly
  • Document collection purposes and access patterns
  • Use collection metadata to track graph usage

Troubleshooting

Common issues and solutions:

  1. Missing Extractions

    • Verify document extraction completed successfully
    • Check document format and content
    • Ensure collection graph was pulled after extraction
  2. Graph Sync Issues

    • Confirm all documents are properly extracted
    • Check collection membership
    • Try resetting and re-pulling collection graph
  3. Performance Problems

    • Monitor collection size
    • Check extraction batch sizes
    • Consider splitting large collections
    • Use pagination for large result sets

Next Steps