More about RAG

On this page

  1. Before you begin
  2. What is RAG?
  3. Set up RAG with R2R
  4. Configure RAG settings
  5. How RAG works in R2R

RAG (Retrieval-Augmented Generation) combines the power of large language models with precise information retrieval from your own documents. When users ask questions, RAG first retrieves relevant information from your document collection, then uses this context to generate accurate, contextual responses. This ensures AI responses are both relevant and grounded in your specific knowledge base.

Before you begin

RAG in R2R has the following requirements:

  • A running R2R instance (local or deployed)
  • Access to an LLM provider (OpenAI, Anthropic, or local models)
  • Documents ingested into your R2R system
  • Basic configuration for document processing and embedding generation

What is RAG?

RAG operates in three main steps:

  1. Retrieval: Finding relevant information from your documents
  2. Augmentation: Adding this information as context for the AI
  3. Generation: Creating responses using both the context and the AI’s knowledge

Benefits over traditional LLM applications:

  • More accurate responses based on your specific documents
  • Reduced hallucination by grounding answers in real content
  • Ability to work with proprietary or recent information
  • Better control over AI outputs

Set up RAG with R2R

To start using RAG in R2R:

  1. Install and start R2R:
$pip install r2r
>r2r serve --docker
  1. Ingest your documents:
$r2r ingest-files --file-paths /path/to/your/documents
  1. Test basic RAG functionality:
$r2r rag --query="your question here"

Configure RAG settings

R2R offers several ways to customize RAG behavior:

  1. Retrieval Settings:
1# Using hybrid search (combines semantic and keyword search)
2client.rag(
3 query="your question",
4 vector_search_settings={"use_hybrid_search": True}
5)
6
7# Adjusting number of retrieved chunks
8client.rag(
9 query="your question",
10 vector_search_settings={"top_k": 5}
11)
  1. Generation Settings:
1# Adjusting response style
2client.rag(
3 query="your question",
4 rag_generation_config={
5 "temperature": 0.7,
6 "model": "openai/gpt-4"
7 }
8)

How RAG works in R2R

R2R’s RAG implementation uses a sophisticated pipeline:

Document Processing

  • Documents are split into semantic chunks
  • Each chunk is embedded using AI models
  • Chunks are stored with metadata and relationships

Retrieval Process

  • Queries are processed using hybrid search
  • Both semantic similarity and keyword matching are considered
  • Results are ranked by relevance scores

Response Generation

  • Retrieved chunks are formatted as context
  • The LLM generates responses using this context
  • Citations and references can be included

Advanced Features

  • GraphRAG for relationship-aware responses
  • Multi-step RAG for complex queries
  • Agent-based RAG for interactive conversations

Best Practices

  1. Document Processing

    • Use appropriate chunk sizes (256-1024 tokens)
    • Maintain document metadata
    • Consider document relationships
  2. Query Optimization

    • Use hybrid search for better retrieval
    • Adjust relevance thresholds
    • Monitor and analyze search performance
  3. Response Generation

    • Balance temperature for creativity vs accuracy
    • Use system prompts for consistent formatting
    • Implement error handling and fallbacks

For more detailed information, visit our RAG Configuration Guide or try our Quickstart.