Learn how to configure RAG in your R2R deployment

RAG Customization

RAG (Retrieval-Augmented Generation) in R2R can be extensively customized to suit various use cases. The main components for customization are:

  1. Generation Configuration: Control the language model’s behavior.
  2. Search Settings: Fine-tune the retrieval process.
  3. Task Prompt Override: Customize the system prompt for specific tasks.

LLM Provider Configuration

Refer to the LLM configuration page here.

Retrieval Configuration

Refer to the retrieval configuration page here.

Combining LLM and Retrieval Configuration for RAG

The rag_generation_config parameter allows you to customize the language model’s behavior. Default settings are set on the server-side using the r2r.toml, as described in in previous configuraiton guides. These settings can be overridden at runtime as shown below:

1# Configure vector search
2vector_search_settings = {
3 "use_vector_search": True,
4 "search_limit": 20,
5 "use_hybrid_search": True,
6 "selected_collection_ids": ["c3291abf-8a4e-5d9d-80fd-232ef6fd8526"]
7}
8
9# Configure graphRAG search
10kg_search_settings = {
11 "use_kg_search": True,
12 "kg_search_type": "local",
13 "kg_search_level": None,
14 "generation_config": {
15 "model": "gpt-4",
16 "temperature": 0.1
17 },
18 "entity_types": ["Person", "Organization"],
19 "relationships": ["worksFor", "foundedBy"],
20 "max_community_description_length": 65536,
21 "max_llm_queries_for_global_search": 250,
22 "local_search_limits": {"__Entity__": 20, "__Relationship__": 20, "__Community__": 20}
23}
24
25# Configure LLM generation
26rag_generation_config = {
27 "model": "anthropic/claude-3-opus-20240229",
28 "temperature": 0.7,
29 "top_p": 0.95,
30 "max_tokens_to_sample": 1500,
31 "stream": True,
32 "functions": None, # For function calling, if supported
33 "tools": None, # For tool use, if supported
34 "add_generation_kwargs": {}, # Additional provider-specific parameters
35 "api_base": None # Custom API endpoint, if needed
36}

When performing a RAG query you can combine these vector search, knowledge graph search, and generation settings at runtime:

1from r2r import R2RClient
2
3client = R2RClient()
4
5response = client.rag(
6 "What are the latest advancements in quantum computing?",
7 rag_generation_config=rag_generation_config
8 vector_search_settings=vector_search_settings,
9 kg_search_settings=kg_search_settings,
10)

R2R defaults to the specified server-side settings when no runtime overrides are specified.

RAG Prompt Override

For specialized tasks, you can override the default RAG task prompt at runtime:

1task_prompt_override = """You are an AI assistant specializing in quantum computing.
2Your task is to provide a concise summary of the latest advancements in the field,
3focusing on practical applications and breakthroughs from the past year."""
4
5response = client.rag(
6 "What are the latest advancements in quantum computing?",
7 rag_generation_config=rag_generation_config,
8 task_prompt_override=task_prompt_override
9)

This prompt can also be set statically on as part of the server configuration process.

Agent-based Interaction

R2R supports multi-turn conversations and complex query processing through its agent endpoint:

1messages = [
2 {"role": "system", "content": "You are a helpful AI assistant."},
3 {"role": "user", "content": "What are the key differences between quantum and classical computing?"}
4]
5
6response = client.agent(
7 messages=messages,
8 vector_search_settings=vector_search_settings,
9 kg_search_settings=kg_search_settings,
10 rag_generation_config=rag_generation_config,
11)

The agent can break down complex queries into sub-tasks, leveraging both retrieval and generation capabilities to provide comprehensive responses. The settings specified in the example above will propagate to the agent and it’s tools.

By leveraging these configuration options, you can fine-tune R2R’s retrieval and generation process to best suit your specific use case and requirements.

Next Steps

For more detailed information on configuring specific components of R2R, please refer to the following pages: