Configure your R2R retrieval pipeline

Introduction

Retrieval in R2R is a sophisticated system that leverages ingested data to provide powerful search and Retrieval-Augmented Generation (RAG) capabilities. It combines vector-based semantic search, knowledge graph querying, and language model generation to deliver accurate and contextually relevant results.

Key Configuration Areas

To configure the retrieval system in R2R, you’ll need to focus on several areas in your r2r.toml file:

1[database]
2provider = "postgres"
3
4[embedding]
5provider = "litellm"
6base_model = "openai/text-embedding-3-small"
7base_dimension = 512
8batch_size = 128
9add_title_as_prefix = false
10rerank_model = "None"
11concurrent_request_limit = 256
12
13[database]
14provider = "postgres"
15batch_size = 256
16
17[completion]
18provider = "litellm"
19concurrent_request_limit = 16
20
21[completion.generation_config]
22model = "openai/gpt-4"
23temperature = 0.1
24top_p = 1
25max_tokens_to_sample = 1_024
26stream = false

These settings directly impact how R2R performs retrieval operations:

  • The [database] section configures the vector database used for semantic search and document management.
  • The [embedding] section defines the model and parameters for converting text into vector embeddings.
  • The [database] section, when configured, enables knowledge graph-based retrieval.
  • The [completion] section sets up the language model used for generating responses in the RAG pipeline.

Customization and Advanced Features

R2R’s retrieval system is highly customizable, allowing you to:

  • Implement hybrid search combining vector-based and knowledge graph queries
  • Customize search filters, limits, and query generation
  • Add custom pipes to the search and RAG pipelines
  • Implement reranking for improved result relevance

Structured Outputs

R2R supports structured outputs for RAG responses, allowing you to define specific response formats using Pydantic models. This ensures consistent, type-safe responses that can be easily validated and processed programmatically.

Some models may require the word ‘JSON’ to appear in their prompt for structured outputs to work. Be sure to update your prompt to reflect this, if necessary.

Here’s a simple example of using structured outputs with Pydantic models:

1from r2r import R2RClient, GenerationConfig
2from pydantic import BaseModel
3
4# Initialize the client
5client = R2RClient()
6
7# Define your response structure
8class ResponseModel(BaseModel):
9 answer: str
10 sources: list[str]
11
12# Make a RAG query with structured output
13response = client.rag(
14 query="…",
15 rag_generation_config=GenerationConfig(
16 response_format=ResponseModel
17 )
18)

Pipeline Architecture

Retrieval in R2R is implemented as a pipeline and consists of the main components shown below:

Next Steps

For more detailed information on configuring specific components of the ingestion pipeline, please refer to the following pages: