Contextual Enrichment
Enhancing chunk quality through contextual understanding
Contextual enrichment is currently restricted to:
- Self-hosted instances
- Enterprise tier cloud accounts
Contact our sales team for Enterprise pricing and features.
When processing documents into chunks, individual segments can sometimes lack necessary context from surrounding content. Chunk enrichment addresses this by incorporating contextual information from neighboring chunks to create more meaningful and comprehensive text segments.
Overview
Chunk enrichment is the process of enhancing individual document chunks by considering their surrounding context.
How Enrichment Works
The enrichment process runs after initial document chunking and:
- Retrieves a configurable number of preceding and succeeding chunks
- Sends the chunks, along with document summary if available, to an LLM
- Generates an enriched version that maintains the original meaning while incorporating relevant context
- Creates new embeddings for the enriched chunks
- Replaces the original chunks in the vector database
Example Enrichment
Consider this example from a technical document about spacecraft:
Chunk Enrichment Example
The enriched version incorporates crucial context about the Martian descent while maintaining the core information about temperature and stress levels. This improved chunk will likely perform better in searches related to Mars missions, atmospheric entry, or heat shield performance.
Configuration Settings
Chunk enrichment can be enabled through a custom configuration file. To learn more about managing your R2R configuration settings, read our self hosting documentation.
Chunk enrichment can modify the original text content. While this generally improves search quality, it’s crucial to note that this process mutates the underlying chunks.
Enrichment Process Details
The enrichment process handles chunks in batches for efficiency:
- Context Collection: Gathers preceding and succeeding chunks based on
n_chunks
setting - LLM Enhancement: Processes chunks through the configured LLM to incorporate context
- Fallback Handling: Maintains original chunk text if enrichment fails
- Batch Processing: Processes chunks in groups of 128 for optimal performance
- Vector Updates: Replaces original chunks with enriched versions in the vector database