Agentic RAG

R2R’s Agentic RAG orchestrates multi-step reasoning with Retrieval-Augmented Generation (RAG). By pairing large language models with advanced retrieval and tool integrations, the agent can fetch relevant data from the internet, your documents and knowledge graphs, reason over it, and produce robust, context-aware answers.

Agentic RAG (also called Deep Research) is an extension of R2R’s basic retrieval functionality. If you are new to R2R, we suggest starting with the Quickstart and Search & RAG docs first.

Key Features

Multi-Step Reasoning

The agent can chain multiple actions, like searching documents or referencing conversation history, before generating its final response.

Retrieval Augmentation

Integrates with R2R’s vector, full-text, or hybrid search to gather the most relevant context for each query.

Conversation Context

Maintain dialogue across multiple turns by including conversation_id in each request.

Tool Usage

Dynamically invoke tools at runtime to gather and analyze information from various sources.

Available Modes

The Agentic RAG system offers two primary operating modes:

RAG Mode (Default)

Standard retrieval-augmented generation for answering questions based on your knowledge base:

  • Semantic and hybrid search capabilities
  • Document-level and chunk-level content retrieval
  • Optional web search integration
  • Source citation and evidence-based responses

Research Mode

Advanced capabilities for deep analysis, reasoning, and computation:

  • All RAG mode capabilities
  • A dedicated reasoning system for complex problem-solving
  • Critique capabilities to identify potential biases or logical fallacies
  • Python execution for computational analysis
  • Multi-step reasoning for deeper exploration of topics

Available Tools

RAG Tools

The agent can use the following tools in RAG mode:

Tool NameDescriptionDependencies
search_file_knowledgeSemantic/hybrid search on your ingested documents using R2R’s search capabilitiesNone
search_file_descriptionsSearch over file-level metadata (titles, doc-level descriptions)None
get_file_contentFetch entire documents or chunk structures for deeper analysisNone
web_searchQuery external search APIs for up-to-date informationRequires SERPER_API_KEY environment variable (serper.dev)
web_scrapeScrape and extract content from specific web pagesRequires FIRECRAWL_API_KEY environment variable (firecrawl.dev)

Research Tools

The agent can use the following tools in Research mode:

Tool NameDescriptionDependencies
ragLeverage the underlying RAG agent to perform information retrieval and synthesisNone
reasoningCall a dedicated model for complex analytical thinkingNone
critiqueAnalyze conversation history to identify flaws, biases, and alternative approachesNone
python_executorExecute Python code for complex calculations and analysisNone

Basic Usage

Below are examples of how to use the agent for both single-turn queries and multi-turn conversations.

1from r2r import R2RClient
2from r2r import (
3 ThinkingEvent,
4 ToolCallEvent,
5 ToolResultEvent,
6 CitationEvent,
7 MessageEvent,
8 FinalAnswerEvent,
9)
10
11# when using auth, do client.users.login(...)
12
13# Basic RAG mode with streaming
14response = client.retrieval.agent(
15 message={
16 "role": "user",
17 "content": "What does DeepSeek R1 imply for the future of AI?"
18 },
19 rag_generation_config={
20 "model": "anthropic/claude-3-7-sonnet-20250219",
21 "extended_thinking": True,
22 "thinking_budget": 4096,
23 "temperature": 1,
24 "top_p": None,
25 "max_tokens_to_sample": 16000,
26 "stream": True
27 },
28 rag_tools=["search_file_knowledge", "get_file_content"],
29 mode="rag"
30)
31
32# Improved streaming event handling
33current_event_type = None
34for event in response:
35 # Check if the event type has changed
36 event_type = type(event)
37 if event_type != current_event_type:
38 current_event_type = event_type
39 print() # Add newline before new event type
40
41 # Print emoji based on the new event type
42 if isinstance(event, ThinkingEvent):
43 print(f"\n🧠 Thinking: ", end="", flush=True)
44 elif isinstance(event, ToolCallEvent):
45 print(f"\n🔧 Tool call: ", end="", flush=True)
46 elif isinstance(event, ToolResultEvent):
47 print(f"\n📊 Tool result: ", end="", flush=True)
48 elif isinstance(event, CitationEvent):
49 print(f"\n📑 Citation: ", end="", flush=True)
50 elif isinstance(event, MessageEvent):
51 print(f"\n💬 Message: ", end="", flush=True)
52 elif isinstance(event, FinalAnswerEvent):
53 print(f"\n✅ Final answer: ", end="", flush=True)
54
55 # Print the content without the emoji
56 if isinstance(event, ThinkingEvent):
57 print(f"{event.data.delta.content[0].payload.value}", end="", flush=True)
58 elif isinstance(event, ToolCallEvent):
59 print(f"{event.data.name}({event.data.arguments})")
60 elif isinstance(event, ToolResultEvent):
61 print(f"{event.data.content[:60]}...")
62 elif isinstance(event, CitationEvent):
63 print(f"{event.data}")
64 elif isinstance(event, MessageEvent):
65 print(f"{event.data.delta.content[0].payload.value}", end="", flush=True)
66 elif isinstance(event, FinalAnswerEvent):
67 print(f"{event.data.generated_answer[:100]}...")
68 print(f" Citations: {len(event.data.citations)} sources referenced")

Using Research Mode

Research mode provides more advanced reasoning capabilities for complex questions:

1# Research mode with all available tools
2response = client.retrieval.agent(
3 message={
4 "role": "user",
5 "content": "Analyze the philosophical implications of DeepSeek R1 for the future of AI reasoning"
6 },
7 research_generation_config={
8 "model": "anthropic/claude-3-opus-20240229",
9 "extended_thinking": True,
10 "thinking_budget": 8192,
11 "temperature": 0.2,
12 "max_tokens_to_sample": 32000,
13 "stream": True
14 },
15 research_tools=["rag", "reasoning", "critique", "python_executor"],
16 mode="research"
17)
18
19# Process streaming events as shown in the previous example
20# ...
21
22# Research mode with computational focus
23# This example solves a mathematical problem using the python_executor tool
24compute_response = client.retrieval.agent(
25 message={
26 "role": "user",
27 "content": "Calculate the factorial of 15 multiplied by 32. Show your work."
28 },
29 research_generation_config={
30 "model": "anthropic/claude-3-opus-20240229",
31 "max_tokens_to_sample": 1000,
32 "stream": False
33 },
34 research_tools=["python_executor"],
35 mode="research"
36)
37
38print(f"Final answer: {compute_response.results.messages[-1].content}")

Customizing the Agent

Tool Selection

You can customize which tools the agent has access to:

1# RAG mode with web capabilities
2response = client.retrieval.agent(
3 message={"role": "user", "content": "What are the latest developments in AI safety?"},
4 rag_tools=["search_file_knowledge", "get_file_content", "web_search", "web_scrape"],
5 mode="rag"
6)
7
8# Research mode with limited tools
9response = client.retrieval.agent(
10 message={"role": "user", "content": "Analyze the complexity of this algorithm"},
11 research_tools=["reasoning", "python_executor"], # Only reasoning and code execution
12 mode="research"
13)

Search Settings Propagation

Any search settings passed to the agent will propagate to downstream searches. This includes:

  • Filters to restrict document sources
  • Limits on the number of results
  • Hybrid search configuration
  • Collection restrictions
1# Using search settings with the agent
2response = client.retrieval.agent(
3 message={"role": "user", "content": "Summarize our Q1 financial results"},
4 search_settings={
5 "use_semantic_search": True,
6 "filters": {"collection_ids": {"$overlap": ["e43864f5-..."]}},
7 "limit": 25
8 },
9 rag_tools=["search_file_knowledge", "get_file_content"],
10 mode="rag"
11)

Model Selection and Parameters

You can customize the agent’s behavior by selecting different models and adjusting generation parameters:

1# Using a specific model with custom parameters
2response = client.retrieval.agent(
3 message={"role": "user", "content": "Write a concise summary of DeepSeek R1's capabilities"},
4 rag_generation_config={
5 "model": "anthropic/claude-3-haiku-20240307", # Faster model for simpler tasks
6 "temperature": 0.3, # Lower temperature for more deterministic output
7 "max_tokens_to_sample": 500, # Limit response length
8 "stream": False # Non-streaming for simpler use cases
9 },
10 mode="rag"
11)

Multi-Turn Conversations

You can maintain context across multiple turns using conversation_id. The agent will remember previous interactions and build upon them in subsequent responses.

1# Create a new conversation
2conversation = client.conversations.create()
3conversation_id = conversation.results.id
4
5# First turn
6first_response = client.retrieval.agent(
7 message={"role": "user", "content": "What does DeepSeek R1 imply for the future of AI?"},
8 rag_generation_config={
9 "model": "anthropic/claude-3-7-sonnet-20250219",
10 "temperature": 0.7,
11 "max_tokens_to_sample": 1000,
12 "stream": False
13 },
14 conversation_id=conversation_id,
15 mode="rag"
16)
17print(f"First response: {first_response.results.messages[-1].content[:100]}...")
18
19# Follow-up query in the same conversation
20follow_up_response = client.retrieval.agent(
21 message={"role": "user", "content": "How does it compare to other reasoning models?"},
22 rag_generation_config={
23 "model": "anthropic/claude-3-7-sonnet-20250219",
24 "temperature": 0.7,
25 "max_tokens_to_sample": 1000,
26 "stream": False
27 },
28 conversation_id=conversation_id,
29 mode="rag"
30)
31print(f"Follow-up response: {follow_up_response.results.messages[-1].content[:100]}...")
32
33# The agent maintains context, so it knows "it" refers to DeepSeek R1

Performance Considerations

Based on our integration testing, here are some considerations to optimize your agent usage:

Response Time Management

Response times vary based on the complexity of the query, the number of tools used, and the length of the requested output:

1# For time-sensitive applications, consider:
2# 1. Using a smaller max_tokens value
3# 2. Selecting faster models like claude-3-haiku
4# 3. Avoiding unnecessary tools
5
6fast_response = client.retrieval.agent(
7 message={"role": "user", "content": "Give me a quick overview of DeepSeek R1"},
8 rag_generation_config={
9 "model": "anthropic/claude-3-haiku-20240307", # Faster model
10 "max_tokens_to_sample": 200, # Limited output
11 "stream": True # Stream for perceived responsiveness
12 },
13 rag_tools=["search_file_knowledge"], # Minimal tools
14 mode="rag"
15)

Handling Large Context

The agent can process large document contexts efficiently, but performance can be improved by using appropriate filters:

1# When working with large document collections, use filters to narrow results
2filtered_response = client.retrieval.agent(
3 message={"role": "user", "content": "Summarize key points from our AI ethics documentation"},
4 search_settings={
5 "filters": {
6 "$and": [
7 {"document_type": {"$eq": "pdf"}},
8 {"metadata.category": {"$eq": "ethics"}},
9 {"metadata.year": {"$gt": 2023}}
10 ]
11 },
12 "limit": 10 # Limit number of chunks returned
13 },
14 rag_generation_config={
15 "max_tokens_to_sample": 500,
16 "stream": True
17 },
18 mode="rag"
19)

How Tools Work (Under the Hood)

R2R’s Agentic RAG leverages a powerful toolset to conduct comprehensive research:

RAG Mode Tools

  • search_file_knowledge: Looks up relevant text chunks and knowledge graph data from your ingested documents using semantic and hybrid search capabilities.
  • search_file_descriptions: Searches over file-level metadata (titles, doc-level descriptions) rather than chunk content.
  • get_file_content: Fetches entire documents or their chunk structures for deeper analysis when the agent needs more comprehensive context.
  • web_search: Queries external search APIs (like Serper or Google) for live, up-to-date information from the internet. Requires a SERPER_API_KEY environment variable.
  • web_scrape: Uses Firecrawl to extract content from specific web pages for in-depth analysis. Requires a FIRECRAWL_API_KEY environment variable.

Research Mode Tools

  • rag: A specialized research tool that utilizes the underlying RAG agent to perform comprehensive information retrieval and synthesis across your data sources.
  • python_executor: Executes Python code for complex calculations, statistical operations, and algorithmic implementations, giving the agent computational capabilities.
  • reasoning: Allows the research agent to call a dedicated model as an external module for complex analytical thinking.
  • critique: Analyzes conversation history to identify potential flaws, biases, and alternative approaches to improve research rigor.

The Agent is built on a sophisticated architecture that combines these tools with streaming capabilities and flexible response formats. It can decide which tools to use based on the query requirements and can dynamically invoke them during the research process.

Conclusion

Agentic RAG provides a powerful approach to retrieval-augmented generation. By combining advanced search, multi-step reasoning, conversation context, and dynamic tool usage, the agent helps you build sophisticated Q&A or research solutions on your R2R-ingested data.

Next Steps