Agentic RAG

R2R’s Agentic RAG orchestrates multi-step reasoning with Retrieval-Augmented Generation (RAG). By pairing large language models with advanced retrieval and tool integrations, the agent can fetch relevant data from the internet, your documents and knowledge graphs, reason over it, and produce robust, context-aware answers.

Agentic RAG (also called Deep Research) is an extension of R2R’s basic retrieval functionality. If you are new to R2R, we suggest starting with the Quickstart and Search & RAG docs first.

Key Features

Multi-Step Reasoning

The agent can chain multiple actions, like searching documents or referencing conversation history, before generating its final response.

Retrieval Augmentation

Integrates with R2R’s vector, full-text, or hybrid search to gather the most relevant context for each query.

Conversation Context

Maintain dialogue across multiple turns by including conversation_id in each request.

Tool Usage

Dynamically invoke tools at runtime to gather and analyze information from various sources.

Available Modes

The Agentic RAG system offers two primary operating modes:

RAG Mode (Default)

Standard retrieval-augmented generation for answering questions based on your knowledge base:

Semantic and hybrid search capabilities
Document-level and chunk-level content retrieval
Optional web search integrations, leveraging Serper and Firecrawl
Source citation and evidence-based responses

Research Mode

Advanced capabilities for deep analysis, reasoning, and computation:

All RAG mode capabilities
A dedicated reasoning system for complex problem-solving
Critique capabilities to identify potential biases or logical fallacies
Python execution for computational analysis
Multi-step reasoning for deeper exploration of topics

Available Tools

RAG Tools

The agent can use the following tools in RAG mode:

Tool Name	Description	Dependencies
`search_file_knowledge`	Semantic/hybrid search on your ingested documents using R2R’s search capabilities	None
`search_file_descriptions`	Search over file-level metadata (titles, doc-level descriptions)	None
`get_file_content`	Fetch entire documents or chunk structures for deeper analysis	None
`web_search`	Query external search APIs for up-to-date information	Requires `SERPER_API_KEY` environment variable (serper.dev)
`web_scrape`	Scrape and extract content from specific web pages	Requires `FIRECRAWL_API_KEY` environment variable (firecrawl.dev)

Research Tools

The agent can use the following tools in Research mode:

Tool Name	Description	Dependencies
`rag`	Leverage the underlying RAG agent to perform information retrieval and synthesis	None
`reasoning`	Call a dedicated model for complex analytical thinking	None
`critique`	Analyze conversation history to identify flaws, biases, and alternative approaches	None
`python_executor`	Execute Python code for complex calculations and analysis	None

Basic Usage

Below are examples of how to use the agent for both single-turn queries and multi-turn conversations.

Python

JavaScript

Curl

1 from r2r import R2RClient
2 from r2r import (
3     ThinkingEvent,
4     ToolCallEvent,
5     ToolResultEvent,
6     CitationEvent,
7     MessageEvent,
8     FinalAnswerEvent,
9 )
10 
11 # when using auth, do client.users.login(...)
12 
13 # Basic RAG mode with streaming
14 response = client.retrieval.agent(
15     message={
16         "role": "user",
17         "content": "What does DeepSeek R1 imply for the future of AI?"
18     },
19     rag_generation_config={
20         "model": "anthropic/claude-3-7-sonnet-20250219",
21         "extended_thinking": True,
22         "thinking_budget": 4096,
23         "temperature": 1,
24         "top_p": None,
25         "max_tokens_to_sample": 16000,
26         "stream": True
27     },
28     rag_tools=["search_file_knowledge", "get_file_content"],
29     mode="rag"
30 )
31 
32 # Improved streaming event handling
33 current_event_type = None
34 for event in response:
35     # Check if the event type has changed
36     event_type = type(event)
37     if event_type != current_event_type:
38         current_event_type = event_type
39         print() # Add newline before new event type
40 
41         # Print emoji based on the new event type
42         if isinstance(event, ThinkingEvent):
43             print(f"\n🧠 Thinking: ", end="", flush=True)
44         elif isinstance(event, ToolCallEvent):
45             print(f"\n🔧 Tool call: ", end="", flush=True)
46         elif isinstance(event, ToolResultEvent):
47             print(f"\n📊 Tool result: ", end="", flush=True)
48         elif isinstance(event, CitationEvent):
49             print(f"\n📑 Citation: ", end="", flush=True)
50         elif isinstance(event, MessageEvent):
51             print(f"\n💬 Message: ", end="", flush=True)
52         elif isinstance(event, FinalAnswerEvent):
53             print(f"\n✅ Final answer: ", end="", flush=True)
54 
55     # Print the content without the emoji
56     if isinstance(event, ThinkingEvent):
57         print(f"{event.data.delta.content[0].payload.value}", end="", flush=True)
58     elif isinstance(event, ToolCallEvent):
59         print(f"{event.data.name}({event.data.arguments})")
60     elif isinstance(event, ToolResultEvent):
61         print(f"{event.data.content[:60]}...")
62     elif isinstance(event, CitationEvent):
63         print(f"{event.data}")
64     elif isinstance(event, MessageEvent):
65         print(f"{event.data.delta.content[0].payload.value}", end="", flush=True)
66     elif isinstance(event, FinalAnswerEvent):
67         print(f"{event.data.generated_answer[:100]}...")
68         print(f"   Citations: {len(event.data.citations)} sources referenced")

Using Research Mode

Research mode provides more advanced reasoning capabilities for complex questions:

Python

JavaScript

1 # Research mode with all available tools
2 response = client.retrieval.agent(
3     message={
4         "role": "user", 
5         "content": "Analyze the philosophical implications of DeepSeek R1 for the future of AI reasoning"
6     },
7     research_generation_config={
8         "model": "anthropic/claude-3-opus-20240229",
9         "extended_thinking": True,
10         "thinking_budget": 8192,
11         "temperature": 0.2,
12         "max_tokens_to_sample": 32000,
13         "stream": True
14     },
15     research_tools=["rag", "reasoning", "critique", "python_executor"],
16     mode="research"
17 )
18 
19 # Process streaming events as shown in the previous example
20 # ...
21 
22 # Research mode with computational focus
23 # This example solves a mathematical problem using the python_executor tool
24 compute_response = client.retrieval.agent(
25     message={
26         "role": "user", 
27         "content": "Calculate the factorial of 15 multiplied by 32. Show your work."
28     },
29     research_generation_config={
30         "model": "anthropic/claude-3-opus-20240229",
31         "max_tokens_to_sample": 1000,
32         "stream": False
33     },
34     research_tools=["python_executor"],
35     mode="research"
36 )
37 
38 print(f"Final answer: {compute_response.results.messages[-1].content}")

Customizing the Agent

Tool Selection

You can customize which tools the agent has access to:

1 # RAG mode with web capabilities
2 response = client.retrieval.agent(
3     message={"role": "user", "content": "What are the latest developments in AI safety?"},
4     rag_tools=["search_file_knowledge", "get_file_content", "web_search", "web_scrape"],
5     mode="rag"
6 )
7 
8 # Research mode with limited tools
9 response = client.retrieval.agent(
10     message={"role": "user", "content": "Analyze the complexity of this algorithm"},
11     research_tools=["reasoning", "python_executor"],  # Only reasoning and code execution
12     mode="research"
13 )

Search Settings Propagation

Any search settings passed to the agent will propagate to downstream searches. This includes:

Filters to restrict document sources
Limits on the number of results
Hybrid search configuration
Collection restrictions

1 # Using search settings with the agent
2 response = client.retrieval.agent(
3     message={"role": "user", "content": "Summarize our Q1 financial results"},
4     search_settings={
5         "use_semantic_search": True,
6         "filters": {"collection_ids": {"$overlap": ["e43864f5-..."]}},
7         "limit": 25
8     },
9     rag_tools=["search_file_knowledge", "get_file_content"],
10     mode="rag"
11 )

Model Selection and Parameters

You can customize the agent’s behavior by selecting different models and adjusting generation parameters:

1 # Using a specific model with custom parameters
2 response = client.retrieval.agent(
3     message={"role": "user", "content": "Write a concise summary of DeepSeek R1's capabilities"},
4     rag_generation_config={
5         "model": "anthropic/claude-3-haiku-20240307",  # Faster model for simpler tasks
6         "temperature": 0.3,                           # Lower temperature for more deterministic output
7         "max_tokens_to_sample": 500,                  # Limit response length
8         "stream": False                               # Non-streaming for simpler use cases
9     },
10     mode="rag"
11 )

Multi-Turn Conversations

You can maintain context across multiple turns using conversation_id. The agent will remember previous interactions and build upon them in subsequent responses.

Python

JavaScript

1 # Create a new conversation
2 conversation = client.conversations.create()
3 conversation_id = conversation.results.id
4 
5 # First turn
6 first_response = client.retrieval.agent(
7     message={"role": "user", "content": "What does DeepSeek R1 imply for the future of AI?"},
8     rag_generation_config={
9         "model": "anthropic/claude-3-7-sonnet-20250219",
10         "temperature": 0.7,
11         "max_tokens_to_sample": 1000,
12         "stream": False
13     },
14     conversation_id=conversation_id,
15     mode="rag"
16 )
17 print(f"First response: {first_response.results.messages[-1].content[:100]}...")
18 
19 # Follow-up query in the same conversation
20 follow_up_response = client.retrieval.agent(
21     message={"role": "user", "content": "How does it compare to other reasoning models?"},
22     rag_generation_config={
23         "model": "anthropic/claude-3-7-sonnet-20250219",
24         "temperature": 0.7,
25         "max_tokens_to_sample": 1000,
26         "stream": False
27     },
28     conversation_id=conversation_id,
29     mode="rag"
30 )
31 print(f"Follow-up response: {follow_up_response.results.messages[-1].content[:100]}...")
32 
33 # The agent maintains context, so it knows "it" refers to DeepSeek R1

Performance Considerations

Based on our integration testing, here are some considerations to optimize your agent usage:

Response Time Management

Response times vary based on the complexity of the query, the number of tools used, and the length of the requested output:

1 # For time-sensitive applications, consider:
2 # 1. Using a smaller max_tokens value
3 # 2. Selecting faster models like claude-3-haiku
4 # 3. Avoiding unnecessary tools
5 
6 fast_response = client.retrieval.agent(
7     message={"role": "user", "content": "Give me a quick overview of DeepSeek R1"},
8     rag_generation_config={
9         "model": "anthropic/claude-3-haiku-20240307",  # Faster model
10         "max_tokens_to_sample": 200,                   # Limited output
11         "stream": True                                 # Stream for perceived responsiveness
12     },
13     rag_tools=["search_file_knowledge"],              # Minimal tools
14     mode="rag"
15 )

Handling Large Context

The agent can process large document contexts efficiently, but performance can be improved by using appropriate filters:

1 # When working with large document collections, use filters to narrow results
2 filtered_response = client.retrieval.agent(
3     message={"role": "user", "content": "Summarize key points from our AI ethics documentation"},
4     search_settings={
5         "filters": {
6             "$and": [
7                 {"document_type": {"$eq": "pdf"}},
8                 {"metadata.category": {"$eq": "ethics"}},
9                 {"metadata.year": {"$gt": 2023}}
10             ]
11         },
12         "limit": 10  # Limit number of chunks returned
13     },
14     rag_generation_config={
15         "max_tokens_to_sample": 500,
16         "stream": True
17     },
18     mode="rag"
19 )

How Tools Work (Under the Hood)

R2R’s Agentic RAG leverages a powerful toolset to conduct comprehensive research:

RAG Mode Tools

search_file_knowledge: Looks up relevant text chunks and knowledge graph data from your ingested documents using semantic and hybrid search capabilities.
search_file_descriptions: Searches over file-level metadata (titles, doc-level descriptions) rather than chunk content.
get_file_content: Fetches entire documents or their chunk structures for deeper analysis when the agent needs more comprehensive context.
web_search: Queries external search APIs (like Serper or Google) for live, up-to-date information from the internet. Requires a SERPER_API_KEY environment variable.
web_scrape: Uses Firecrawl to extract content from specific web pages for in-depth analysis. Requires a FIRECRAWL_API_KEY environment variable.

Research Mode Tools

rag: A specialized research tool that utilizes the underlying RAG agent to perform comprehensive information retrieval and synthesis across your data sources.
python_executor: Executes Python code for complex calculations, statistical operations, and algorithmic implementations, giving the agent computational capabilities.
reasoning: Allows the research agent to call a dedicated model as an external module for complex analytical thinking.
critique: Analyzes conversation history to identify potential flaws, biases, and alternative approaches to improve research rigor.

The Agent is built on a sophisticated architecture that combines these tools with streaming capabilities and flexible response formats. It can decide which tools to use based on the query requirements and can dynamically invoke them during the research process.

Conclusion

Agentic RAG provides a powerful approach to retrieval-augmented generation. By combining advanced search, multi-step reasoning, conversation context, and dynamic tool usage, the agent helps you build sophisticated Q&A or research solutions on your R2R-ingested data.

Next Steps

Ingest your content using Documents.
Explore advanced retrieval in Hybrid Search.
Enhance understanding with Knowledge Graphs.
Manage multi-turn chat with Conversations.
Scale up your solution with the API & SDKs.