RAG — The most advanced AI retrieval system. Containerized, Retrieval-Augmented Generation (RAG) with a RESTful API.

RAG Customization

RAG (Retrieval-Augmented Generation) in R2R can be extensively customized to suit various use cases. The main components for customization are:

Generation Configuration: Control the language model’s behavior.
Search Settings: Fine-tune the retrieval process.
Task Prompt Override: Customize the system prompt for specific tasks.

LLM Provider Configuration

Refer to the LLM configuration page here.

Retrieval Configuration

Refer to the retrieval configuration page here.

Combining LLM and Retrieval Configuration for RAG

The rag_generation_config parameter allows you to customize the language model’s behavior. Default settings are set on the server-side using the r2r.toml, as described in in previous configuraiton guides. These settings can be overridden at runtime as shown below:

1 # Configure graphRAG search
2 graph_settings = {
3     "enabled": True,
4     "generation_config": {
5         "model": "gpt-4",
6         "temperature": 0.1
7     },
8     "entity_types": ["Person", "Organization"],
9     "relationships": ["worksFor", "foundedBy"],
10     "max_community_description_length": 65536,
11     "max_llm_queries_for_global_search": 250,
12     "local_limits": {"__Entity__": 20, "__Relationship__": 20, "__Community__": 20}
13 }
14 
15 # Configure LLM generation
16 rag_generation_config = {
17     "model": "anthropic/claude-3-opus-20240229",
18     "temperature": 0.7,
19     "top_p": 0.95,
20     "max_tokens_to_sample": 1500,
21     "stream": True,
22     "functions": None,  # For function calling, if supported
23     "tools": None,  # For tool use, if supported
24     "add_generation_kwargs": {},  # Additional provider-specific parameters
25     "api_base": None  # Custom API endpoint, if needed
26 }

When performing a RAG query you can combine these vector search, knowledge graph search, and generation settings at runtime:

1 from r2r import R2RClient
2 
3 client = R2RClient()
4 
5 response = client.retrieval.rag(
6     "What are the latest advancements in quantum computing?",
7     rag_generation_config=rag_generation_config,
8     search_settings={
9         "use_semantic_search": True,
10         "limit": 20,
11         "use_hybrid_search": True,
12         "graph_settings": graph_settings
13     }
14 )

R2R defaults to the specified server-side settings when no runtime overrides are specified.

RAG Prompt Override

For specialized tasks, you can override the default RAG task prompt at runtime:

1 task_prompt_override = """You are an AI assistant specializing in quantum computing.
2 Your task is to provide a concise summary of the latest advancements in the field,
3 focusing on practical applications and breakthroughs from the past year."""
4 
5 response = client.retrieval.rag(
6     "What are the latest advancements in quantum computing?",
7     rag_generation_config=rag_generation_config,
8     task_prompt_override=task_prompt_override
9 )

This prompt can also be set statically on as part of the server configuration process.

Agent-based Interaction

R2R supports multi-turn conversations and complex query processing through its agent endpoint:

1 messages = [
2     {"role": "system", "content": "You are a helpful AI assistant."},
3     {"role": "user", "content": "What are the key differences between quantum and classical computing?"}
4 ]
5 
6 response = client.retrieval.agent(
7     messages=messages,
8     vector_search_settings=vector_search_settings,
9     graph_settings=graph_settings,
10     rag_generation_config=rag_generation_config,
11 )

The agent can break down complex queries into sub-tasks, leveraging both retrieval and generation capabilities to provide comprehensive responses. The settings specified in the example above will propagate to the agent and it’s tools.

By leveraging these configuration options, you can fine-tune R2R’s retrieval and generation process to best suit your specific use case and requirements.

1	# Configure graphRAG search
2	graph_settings = {
3	"enabled": True,
4	"generation_config": {
5	"model": "gpt-4",
6	"temperature": 0.1
7	},
8	"entity_types": ["Person", "Organization"],
9	"relationships": ["worksFor", "foundedBy"],
10	"max_community_description_length": 65536,
11	"max_llm_queries_for_global_search": 250,
12	"local_limits": {"__Entity__": 20, "__Relationship__": 20, "__Community__": 20}
13	}
14
15	# Configure LLM generation
16	rag_generation_config = {
17	"model": "anthropic/claude-3-opus-20240229",
18	"temperature": 0.7,
19	"top_p": 0.95,
20	"max_tokens_to_sample": 1500,
21	"stream": True,
22	"functions": None, # For function calling, if supported
23	"tools": None, # For tool use, if supported
24	"add_generation_kwargs": {}, # Additional provider-specific parameters
25	"api_base": None # Custom API endpoint, if needed
26	}

1	from r2r import R2RClient
2
3	client = R2RClient()
4
5	response = client.retrieval.rag(
6	"What are the latest advancements in quantum computing?",
7	rag_generation_config=rag_generation_config,
8	search_settings={
9	"use_semantic_search": True,
10	"limit": 20,
11	"use_hybrid_search": True,
12	"graph_settings": graph_settings
13	}
14	)

1	task_prompt_override = """You are an AI assistant specializing in quantum computing.
2	Your task is to provide a concise summary of the latest advancements in the field,
3	focusing on practical applications and breakthroughs from the past year."""
4
5	response = client.retrieval.rag(
6	"What are the latest advancements in quantum computing?",
7	rag_generation_config=rag_generation_config,
8	task_prompt_override=task_prompt_override
9	)

1	messages = [
2	{"role": "system", "content": "You are a helpful AI assistant."},
3	{"role": "user", "content": "What are the key differences between quantum and classical computing?"}
4	]
5
6	response = client.retrieval.agent(
7	messages=messages,
8	vector_search_settings=vector_search_settings,
9	graph_settings=graph_settings,
10	rag_generation_config=rag_generation_config,
11	)