RAG — Build, scale, and manage user-facing Retrieval-Augmented Generation applications.

RAG Customization

RAG (Retrieval-Augmented Generation) in R2R can be extensively customized to suit various use cases. The main components for customization are:

Generation Configuration: Control the language model’s behavior.
Search Settings: Fine-tune the retrieval process.
Task Prompt Override: Customize the system prompt for specific tasks.

LLM Provider Configuration

Refer to the LLM configuration page here.

Retrieval Configuration

Refer to the retrieval configuration page here.

Combining LLM and Retrieval Configuration for RAG

The rag_generation_config parameter allows you to customize the language model’s behavior. Default settings are set on the server-side using the r2r.toml, as described in in previous configuraiton guides. These settings can be overridden at runtime as shown below:

1 # Configure vector search
2 vector_search_settings = {
3     "use_vector_search": True,
4     "search_limit": 20,
5     "use_hybrid_search": True,
6     "selected_collection_ids": ["c3291abf-8a4e-5d9d-80fd-232ef6fd8526"]
7 }
8 
9 # Configure graphRAG search
10 kg_search_settings = {
11     "use_kg_search": True,
12     "kg_search_type": "local",
13     "kg_search_level": None,
14     "generation_config": {
15         "model": "gpt-4",
16         "temperature": 0.1
17     },
18     "entity_types": ["Person", "Organization"],
19     "relationships": ["worksFor", "foundedBy"],
20     "max_community_description_length": 65536,
21     "max_llm_queries_for_global_search": 250,
22     "local_search_limits": {"__Entity__": 20, "__Relationship__": 20, "__Community__": 20}
23 }
24 
25 # Configure LLM generation
26 rag_generation_config = {
27     "model": "anthropic/claude-3-opus-20240229",
28     "temperature": 0.7,
29     "top_p": 0.95,
30     "max_tokens_to_sample": 1500,
31     "stream": True,
32     "functions": None,  # For function calling, if supported
33     "tools": None,  # For tool use, if supported
34     "add_generation_kwargs": {},  # Additional provider-specific parameters
35     "api_base": None  # Custom API endpoint, if needed
36 }

When performing a RAG query you can combine these vector search, knowledge graph search, and generation settings at runtime:

1 from r2r import R2RClient
2 
3 client = R2RClient()
4 
5 response = client.rag(
6     "What are the latest advancements in quantum computing?",
7     rag_generation_config=rag_generation_config
8     vector_search_settings=vector_search_settings,
9     kg_search_settings=kg_search_settings,
10 )

R2R defaults to the specified server-side settings when no runtime overrides are specified.

RAG Prompt Override

For specialized tasks, you can override the default RAG task prompt at runtime:

1 task_prompt_override = """You are an AI assistant specializing in quantum computing.
2 Your task is to provide a concise summary of the latest advancements in the field,
3 focusing on practical applications and breakthroughs from the past year."""
4 
5 response = client.rag(
6     "What are the latest advancements in quantum computing?",
7     rag_generation_config=rag_generation_config,
8     task_prompt_override=task_prompt_override
9 )

This prompt can also be set statically on as part of the server configuration process.

Agent-based Interaction

R2R supports multi-turn conversations and complex query processing through its agent endpoint:

1 messages = [
2     {"role": "system", "content": "You are a helpful AI assistant."},
3     {"role": "user", "content": "What are the key differences between quantum and classical computing?"}
4 ]
5 
6 response = client.agent(
7     messages=messages,
8     vector_search_settings=vector_search_settings,
9     kg_search_settings=kg_search_settings,
10     rag_generation_config=rag_generation_config,
11 )

The agent can break down complex queries into sub-tasks, leveraging both retrieval and generation capabilities to provide comprehensive responses. The settings specified in the example above will propagate to the agent and it’s tools.

By leveraging these configuration options, you can fine-tune R2R’s retrieval and generation process to best suit your specific use case and requirements.

Next Steps

For more detailed information on configuring specific components of R2R, please refer to the following pages:

1	# Configure vector search
2	vector_search_settings = {
3	"use_vector_search": True,
4	"search_limit": 20,
5	"use_hybrid_search": True,
6	"selected_collection_ids": ["c3291abf-8a4e-5d9d-80fd-232ef6fd8526"]
7	}
8
9	# Configure graphRAG search
10	kg_search_settings = {
11	"use_kg_search": True,
12	"kg_search_type": "local",
13	"kg_search_level": None,
14	"generation_config": {
15	"model": "gpt-4",
16	"temperature": 0.1
17	},
18	"entity_types": ["Person", "Organization"],
19	"relationships": ["worksFor", "foundedBy"],
20	"max_community_description_length": 65536,
21	"max_llm_queries_for_global_search": 250,
22	"local_search_limits": {"__Entity__": 20, "__Relationship__": 20, "__Community__": 20}
23	}
24
25	# Configure LLM generation
26	rag_generation_config = {
27	"model": "anthropic/claude-3-opus-20240229",
28	"temperature": 0.7,
29	"top_p": 0.95,
30	"max_tokens_to_sample": 1500,
31	"stream": True,
32	"functions": None, # For function calling, if supported
33	"tools": None, # For tool use, if supported
34	"add_generation_kwargs": {}, # Additional provider-specific parameters
35	"api_base": None # Custom API endpoint, if needed
36	}

1	from r2r import R2RClient
2
3	client = R2RClient()
4
5	response = client.rag(
6	"What are the latest advancements in quantum computing?",
7	rag_generation_config=rag_generation_config
8	vector_search_settings=vector_search_settings,
9	kg_search_settings=kg_search_settings,
10	)

1	task_prompt_override = """You are an AI assistant specializing in quantum computing.
2	Your task is to provide a concise summary of the latest advancements in the field,
3	focusing on practical applications and breakthroughs from the past year."""
4
5	response = client.rag(
6	"What are the latest advancements in quantum computing?",
7	rag_generation_config=rag_generation_config,
8	task_prompt_override=task_prompt_override
9	)

1	messages = [
2	{"role": "system", "content": "You are a helpful AI assistant."},
3	{"role": "user", "content": "What are the key differences between quantum and classical computing?"}
4	]
5
6	response = client.agent(
7	messages=messages,
8	vector_search_settings=vector_search_settings,
9	kg_search_settings=kg_search_settings,
10	rag_generation_config=rag_generation_config,
11	)