Search and RAG

Search and retrieve information using vectors, text, and RAG

R2R provides powerful search and retrieval capabilities through vector search, full-text search, hybrid search, and Retrieval-Augmented Generation (RAG). The system supports multiple search modes and extensive runtime configuration to help you find and contextualize information effectively.

Refer to the retrieval API and SDK reference for detailed retrieval examples.

R2R offers powerful and highly configurable search capabilities, including vector search, hybrid search, and knowledge graph-enhanced search. These features allow for more accurate and contextually relevant information retrieval.

Vector search parameters inside of R2R can be fine-tuned at runtime for optimal results. Here’s how to perform a basic vector search:

1client.retrieval.search(
2 query="What is DeepSeek R1?",
3)

Example Output:

AggregateSearchResult(
chunk_search_results=[
ChunkSearchResult(
score=0.643,
text="Document Title: DeepSeek_R1.pdf
Text: could achieve an accuracy of over 70%.
DeepSeek-R1 also delivers impressive results on IF-Eval, a benchmark designed to assess a
models ability to follow format instructions. These improvements can be linked to the inclusion
of instruction-following data during the final stages of supervised fine-tuning (SFT) and RL
training. Furthermore, remarkable performance is observed on AlpacaEval2.0 and ArenaHard,
indicating DeepSeek-R1s strengths in writing tasks and open-domain question answering. Its
significant outperformance of DeepSeek-V3 underscores the generalization benefits of large-scale
RL, which not only boosts reasoning capabilities but also improves performance across diverse
domains. Moreover, the summary lengths generated by DeepSeek-R1 are concise, with an
average of 689 tokens on ArenaHard and 2,218 characters on AlpacaEval 2.0. This indicates that
DeepSeek-R1 avoids introducing length bias during GPT-based evaluations, further solidifying
its robustness across multiple tasks."
), ...
],
graph_search_results=[],
web_search_results=[],
context_document_results=[]
)

Key configurable parameters for vector search can be inferred from the retrieval API reference.

R2R supports hybrid search, which combines traditional keyword-based search with vector search for improved results. Here’s how to perform a hybrid search:

1client.retrieval.search(
2 "What was Uber's profit in 2020?",
3 search_settings={
4 "index_measure": "l2_distance",
5 "use_hybrid_search": True,
6 "hybrid_settings": {
7 "full_text_weight": 1.0,
8 "semantic_weight": 5.0,
9 "full_text_limit": 200,
10 "rrf_k": 50,
11 },
12 "filters": {"title": {"$in": ["DeepSeek_R1.pdf"]}},
13 },
14)

Advanced Filtering

R2R allows you to apply filters to your search queries to narrow down results. Filters can be used to target specific documents, date ranges, or any metadata field:

1# Search with filters
2results = client.retrieval.search(
3 query="What are the effects of climate change?",
4 search_settings={
5 "filters": {
6 "$and":[
7 {"document_type": {"$eq": "pdf"}},
8 {"metadata.year": {"$gt": 2020}}
9 ]
10 },
11 "limit": 10
12 }
13)

AI Retrieval (RAG)

R2R is built around a comprehensive Retrieval-Augmented Generation (RAG) engine, allowing you to generate contextually relevant responses based on your ingested documents. The RAG process combines all the search functionality shown above with Large Language Models to produce more accurate and informative answers.

Basic RAG

To generate a response using RAG, use the following command:

1client.retrieval.rag(query="What is DeepSeek R1?")

Example Output:

$RAGResponse(
> generated_answer='DeepSeek-R1 is a model that demonstrates impressive performance across various tasks, leveraging reinforcement learning (RL) and supervised fine-tuning (SFT) to enhance its capabilities. It excels in writing tasks, open-domain question answering, and benchmarks like IF-Eval, AlpacaEval2.0, and ArenaHard [1], [2]. DeepSeek-R1 outperforms its predecessor, DeepSeek-V3, in several areas, showcasing its strengths in reasoning and generalization across diverse domains [1]. It also achieves competitive results on factual benchmarks like SimpleQA, although it performs worse on the Chinese SimpleQA benchmark due to safety RL constraints [2]. Additionally, DeepSeek-R1 is involved in distillation processes to transfer its reasoning capabilities to smaller models, which perform exceptionally well on benchmarks [4], [6]. The model is optimized for English and Chinese, with plans to address language mixing issues in future updates [8].',
> search_results=AggregateSearchResult(
> chunk_search_results=[ChunkSearchResult(score=0.643, text='Document Title: DeepSeek_R1.pdf...')]
> ),
> citations=[
> Citation(
> id='cit.123456',
> object='citation',
> payload=ChunkSearchResult(score=0.643, text='Document Title: DeepSeek_R1.pdf...', id='e760bb76-1c6e-52eb-910d-0ce5b567011b', document_id='e43864f5-a36f-548e-aacd-6f8d48b30c7f', owner_id='2acb499e-8428-543b-bd85-0d9098718220', collection_ids=['122fdf6a-e116-546b-a8f6-e4cb2e2c0a09'])
> )
> ],
> metadata={'id': 'chatcmpl-B0BaZ0vwIa58deI0k8NIuH6pBhngw', 'choices': [...], 'created': 1739384247, 'model': 'gpt-4o-2024-08-06', ...}
>)

RAG with Web Search Integration

R2R now supports web search integration, allowing you to enhance your RAG responses with up-to-date information from the web. To include web search in your RAG query, simply add the include_web_search flag:

1results = client.retrieval.rag(
2 "What are the latest developments with DeepSeek R1?",
3 include_web_search=True
4)

When include_web_search is set to true, the system will perform a web search and include relevant results in the context provided to the LLM, enhancing the response with the most current information available online.

R2R also supports hybrid search in RAG, combining the power of vector search and keyword-based search. To use hybrid search in RAG, simply add the use_hybrid_search flag to your search settings input:

1results = client.retrieval.rag("Who is Jon Snow?", {"use_hybrid_search": True})

This example demonstrates how hybrid search can enhance the RAG process by combining semantic understanding with keyword matching, potentially providing more accurate and comprehensive results.

Streaming RAG

R2R also supports streaming RAG responses, which can be useful for real-time applications.

When using streaming RAG, you’ll receive different types of events:

  1. SearchResultsEvent - Contains the initial search results from your documents
  2. MessageEvent - Streams partial tokens of the response as they are generated
  3. CitationEvent - Indicates when a citation is added to the response, with relevant metadata including:
    • id - Unique identifier for the citation
    • object - Always “citation”
    • source_type - The type of source (chunk, graph, web, etc.)
    • source_title - Title of the source document when available
  4. FinalAnswerEvent - Contains the complete generated answer and structured citations
  5. ThinkingEvent - For reasoning agents, contains the model’s step-by-step reasoning process

The citations in the final response are structured objects that link specific passages in the response to their source documents, enabling proper attribution and verification. To use streaming RAG:

1from r2r import (
2 CitationEvent,
3 FinalAnswerEvent,
4 MessageEvent,
5 SearchResultsEvent,
6 R2RClient,
7)
8
9
10result_stream = client.retrieval.rag(
11 query="What is DeepSeek R1?",
12 search_settings={"limit": 25},
13 rag_generation_config={"stream": True},
14 include_web_search=True, # Optional: Include web search results
15)
16
17# can also do a switch on `type` field
18for event in result_stream:
19 if isinstance(event, SearchResultsEvent):
20 print("Search results:", event.data)
21 elif isinstance(event, MessageEvent):
22 print("Partial message:", event.data.delta)
23 elif isinstance(event, CitationEvent):
24 print("New citation detected:", event.data)
25 elif isinstance(event, FinalAnswerEvent):
26 print("Final answer:", event.data.generated_answer)
27 print("Citations:", event.data.citations)

Example Output:

Search results: id='run_1' object='rag.search_results' data={'chunk_search_results': [{'id': '1e40ee7e-2eef-524f-b5c6-1a1910e73ccc', 'document_id': '652075c0-3a43-519f-9625-f581e7605bc5', 'owner_id': '2acb499e-8428-543b-bd85-0d9098718220', 'collection_ids': ['122fdf6a-e116-546b-a8f6-e4cb2e2c0a09'], 'score': 0.7945216641038179, 'text': 'data, achieving strong performance across various tasks. DeepSeek-R1 is more powerful,\nleveraging cold-start data alongside iterative RL fine-tuning. Ultimately ...
...
Partial message: {'content': [MessageDelta(type='text', text={'value': 'Deep', 'annotations': []})]}
Partial message: {'content': [MessageDelta(type='text', text={'value': 'Seek', 'annotations': []})]}
Partial message: {'content': [MessageDelta(type='text', text={'value': '-R', 'annotations': []})]}
...
New Citation Detected: {'id':..., 'object':...}
...
Final answer: DeepSeek-R1 is a large language model developed by the DeepSeek-AI research team. It is a reasoning model that has been trained using multi-stage training and cold-start data before reinforcement learning (RL). The model demonstrates superior performance on various benchmarks, including MMLU, MMLU-Pro, GPQA Diamond, and FRAMES, particularly in STEM-related questions. ...

Customizing RAG

R2R offers extensive customization options for its Retrieval-Augmented Generation (RAG) functionality:

  1. Search Settings: Customize vector and knowledge graph search parameters using VectorSearchSettings and KGSearchSettings.

  2. Generation Config: Fine-tune the language model’s behavior with GenerationConfig, including:

    • Temperature, top_p, top_k for controlling randomness
    • Max tokens, model selection, and streaming options
    • Advanced settings like beam search and sampling strategies
  3. Web Search Integration: Enable web search to supplement your knowledge base with:

    • include_web_search: Boolean flag to include web search results
    • Automatic merging of web results with your document corpus
  4. Multiple LLM Support: Easily switch between different language models and providers:

    • OpenAI models (default)
    • Anthropic’s Claude models
    • Local models via Ollama
    • Any provider supported by LiteLLM

Example of customizing the model with web search:

1# requires ANTHROPIC_API_KEY is set
2response = client.retrieval.rag(
3 "Who was Aristotle and what are his recent influences?",
4 rag_generation_config={"model":"anthropic/claude-3-haiku-20240307", "stream": True},
5 include_web_search=True # Include web search for up-to-date information
6)
7for chunk in response:
8 print(chunk)

This flexibility allows you to optimize RAG performance for your specific use case and leverage the strengths of various LLM providers while incorporating the latest information from the web.

Streaming Agent (Deep Research Mode)

R2R offers a powerful agentic retrieval mode that performs in-depth analysis of documents through iterative research and reasoning. This mode can replicate Deep Research-like results by leveraging a variety of tools to thoroughly investigate your data and the web:

1results = client.retrieval.agent(
2 query="What does deepseek r1 imply for the future of AI?",
3 generation_config={
4 "model": "anthropic/claude-3-7-sonnet-20250219",
5 "extended_thinking": True,
6 "thinking_budget": 4096,
7 "temperature": 1,
8 "top_p": None,
9 "max_tokens_to_sample": 16000,
10 "stream": True
11 },
12 mode="research",
13 rag_tools=["web_search", "web_scrape"] # Include web search and scrape tools
14)
15
16# Process the streaming events
17for event in results:
18 if isinstance(event, ThinkingEvent):
19 print(f"🧠 Thinking: {event.data.delta.content[0].payload.value}")
20 elif isinstance(event, ToolCallEvent):
21 print(f"🔧 Tool call: {event.data.name}({event.data.arguments})")
22 elif isinstance(event, ToolResultEvent):
23 print(f"📊 Tool result: {event.data.content[:60]}...")
24 elif isinstance(event, CitationEvent):
25 print(f"📑 Citation: {event.data}")
26 elif isinstance(event, MessageEvent):
27 print(f"💬 Message: {event.data.delta.content[0].payload.value}")
28 elif isinstance(event, FinalAnswerEvent):
29 print(f"✅ Final answer: {event.data.generated_answer[:100]}...")
30 print(f" Citations: {len(event.data.citations)} sources referenced")

Example of streaming output:

🧠 Thinking: Analyzing the query about DeepSeek R1 implications...
🔧 Tool call: search({"query":"DeepSeek R1 capabilities advancements"})
📊 Tool result: DeepSeek-R1 is a reasoning-focused LLM that uses reinforcement learning...
🧠 Thinking: The search provides valuable information about DeepSeek R1's capabilities
🧠 Thinking: Need more specific information about its performance in reasoning tasks
🔧 Tool call: search({"query":"DeepSeek R1 reasoning benchmarks performance"})
📊 Tool result: DeepSeek-R1 achieves strong results on reasoning benchmarks including MMLU...
📑 Citation: cit.a1b2c3, source: DeepSeek_R1.pdf
🧠 Thinking: Now I need to understand the implications for AI development
🔧 Tool call: web_search({"query":"AI reasoning capabilities future development"})
📊 Tool result: Advanced reasoning capabilities are considered a key milestone toward...
📑 Citation: cit.d4e5f6, source: AI_Roadmap_2024.pdf
💬 Message: DeepSeek-R1 has several important implications for the future of AI development:
💬 Message: 1. **Reinforcement Learning as a Key Approach**: DeepSeek-R1's success demonstrates...
📑 Citation: cit.g7h8i9, source: DeepSeek_R1.pdf
💬 Message: 2. **Efficiency Through Distillation**: The model shows that reasoning capabilities...
✅ Final answer: DeepSeek-R1 has several important implications for the future of AI development: 1. Reinforcement Learning...
Citations: 3 sources referenced

R2R provides knowledge graph integration with its search capabilities, allowing for more contextually rich results by leveraging entity and relationship information.

1client.retrieval.search(
2 "What was DeepSeek R1",
3 graph_search_settings={
4 "use_graph_search": True,
5 "kg_search_type": "local"
6 }
7)

Behind the scenes, R2R’s RetrievalService handles RAG requests, combining the power of vector search, optional knowledge graph integration, web search integration, and language model generation.

Conclusion

R2R’s search and RAG capabilities provide flexible tools for finding and contextualizing information. Whether you need simple semantic search or complex hybrid retrieval with custom RAG generation and web search integration, the system can be configured to meet your specific needs.

For more advanced use cases: