More about RAG | The most advanced AI retrieval system. Agentic Retrieval-Augmented Generation (RAG) with a RESTful API.

On this page

Before you begin
What is RAG?
Set up RAG with R2R
Configure RAG settings
How RAG works in R2R

RAG (Retrieval-Augmented Generation) combines the power of large language models with precise information retrieval from your own documents. When users ask questions, RAG first retrieves relevant information from your document collection, then uses this context to generate accurate, contextual responses. This ensures AI responses are both relevant and grounded in your specific knowledge base.

Before you begin

RAG in R2R has the following requirements:

A running R2R instance (local or deployed)
Access to an LLM provider (OpenAI, Anthropic, or local models)
Documents ingested into your R2R system
Basic configuration for document processing and embedding generation

What is RAG?

RAG operates in three main steps:

Retrieval: Finding relevant information from your documents
Augmentation: Adding this information as context for the AI
Generation: Creating responses using both the context and the AI’s knowledge

Benefits over traditional LLM applications:

More accurate responses based on your specific documents
Reduced hallucination by grounding answers in real content
Ability to work with proprietary or recent information
Better control over AI outputs

Set up RAG with R2R

To start using RAG in R2R:

Install and start R2R:

1 pip install r2r
2 r2r serve --docker

Ingest your documents:

1 r2r documents create --file-paths /path/to/your/documents

Test basic RAG functionality:

1 r2r retrieval rag --query="your question here"

Configure RAG settings

R2R offers several ways to customize RAG behavior:

Retrieval Settings:

1 # Using hybrid search (combines semantic and keyword search)
2 client.retrieval.rag(
3     query="your question",
4     vector_search_settings={"use_hybrid_search": True}
5 )
6 
7 # Adjusting number of retrieved chunks
8 client.retrieval.rag(
9     query="your question",
10     vector_search_settings={"limit": 30}
11 )

Generation Settings:

1 # Adjusting response style
2 client.retrieval.rag(
3     query="your question",
4     rag_generation_config={
5         "temperature": 0.7,
6         "model": "openai/gpt-4"
7     }
8 )

How RAG works in R2R

R2R’s RAG implementation uses a sophisticated process:

Document Processing

Documents are split into semantic chunks
Each chunk is embedded using AI models
Chunks are stored with metadata and relationships

Retrieval Process

Queries are processed using hybrid search
Both semantic similarity and keyword matching are considered
Results are ranked by relevance scores

Response Generation

Retrieved chunks are formatted as context
The LLM generates responses using this context
Citations and references can be included

Advanced Features

GraphRAG for relationship-aware responses
Multi-step RAG for complex queries
Agent-based RAG for interactive conversations

Best Practices

Document Processing
- Use appropriate chunk sizes (256-1024 tokens)
- Maintain document metadata
- Consider document relationships
Query Optimization
- Use hybrid search for better retrieval
- Adjust relevance thresholds
- Monitor and analyze search performance
Response Generation
- Balance temperature for creativity vs accuracy
- Use system prompts for consistent formatting
- Implement error handling and fallbacks

For more detailed information, visit our RAG Configuration Guide or try our Quickstart.