Search and RAG

R2R provides powerful search and retrieval capabilities through vector search, full-text search, hybrid search, and Retrieval-Augmented Generation (RAG). The system supports multiple search modes and extensive runtime configuration to help you find and contextualize information effectively.

Refer to the retrieval API and SDK reference for detailed retrieval examples.

Search Modes and Settings

When using the Search (/retrieval/search) or RAG (/retrieval/rag) endpoints, you control the retrieval process using search_mode and search_settings.

  • search_mode (Optional, defaults to custom): Choose between pre-configured modes or full customization.
    • basic: Defaults to a simple semantic search configuration. Good for quick setup.
    • advanced: Defaults to a hybrid search configuration combining semantic and full-text. Offers broader results.
    • custom: Allows full control via the search_settings object. If search_settings are omitted in custom mode, default vector search settings are applied.
  • search_settings (Optional): A detailed configuration object. If provided alongside basic or advanced modes, these settings will override the mode’s defaults. Key settings include:
    • use_semantic_search: Boolean to enable/disable vector-based semantic search (default: true unless overridden).
    • use_fulltext_search: Boolean to enable/disable keyword-based full-text search (default: false unless using hybrid).
    • use_hybrid_search: Boolean to enable hybrid search, combining semantic and full-text (default: false). Requires hybrid_settings.
    • filters: Apply complex filtering rules using MongoDB-like syntax (see “Advanced Filtering” below).
    • limit: Integer controlling the maximum number of results to return (default: 10).
    • hybrid_settings: Object to configure weights (semantic_weight, full_text_weight), limits (full_text_limit), and fusion (rrf_k) for hybrid search.
    • chunk_settings: Object to fine-tune vector index parameters like index_measure (distance metric), probes, ef_search.
    • search_strategy: String to enable advanced RAG techniques like "hyde" or "rag_fusion" (default: "vanilla"). See Advanced RAG.
    • include_scores: Boolean to include relevance scores in the results (default: true).
    • include_metadatas: Boolean to include metadata in the results (default: true).

AI Powered Search (/retrieval/search)

R2R offers powerful and highly configurable search capabilities. This endpoint returns raw search results without LLM generation.

Basic Search Example

This performs a search using default configurations or a specified mode.

1# Uses default settings (likely semantic search in 'custom' mode)
2results = client.retrieval.search(
3 query="What is DeepSeek R1?",
4)
5
6# Explicitly using 'basic' mode
7results_basic = client.retrieval.search(
8 query="What is DeepSeek R1?",
9 search_mode="basic",
10)

Response Structure (WrappedSearchResponse):

The search endpoint returns a WrappedSearchResponse containing an AggregateSearchResult object with fields like:

  • results.chunk_search_results: A list of relevant text ChunkSearchResult objects found (containing id, document_id, text, score, metadata).
  • results.graph_search_results: A list of relevant GraphSearchResult objects (entities, relationships, communities) if graph search is active and finds results.
  • results.web_search_results: A list of WebSearchResult objects (if web search was somehow enabled, though typically done via RAG/Agent).
1// Simplified Example Structure
2{
3 "results": {
4 "chunk_search_results": [
5 {
6 "score": 0.643,
7 "text": "Document Title: DeepSeek_R1.pdf...",
8 "id": "chunk-uuid-...",
9 "document_id": "doc-uuid-...",
10 "metadata": { ... }
11 },
12 // ... more chunks
13 ],
14 "graph_search_results": [
15 // Example: An entity result if graph search ran
16 {
17 "id": "graph-entity-uuid...",
18 "content": { "name": "DeepSeek-R1", "description": "A large language model...", "id": "entity-uuid..." },
19 "result_type": "ENTITY",
20 "score": 0.95,
21 "metadata": { ... }
22 }
23 // ... potentially relationships or communities
24 ],
25 "web_search_results": []
26 }
27}

Hybrid Search Example

Combine keyword-based (full-text) search with vector search for potentially broader results.

1hybrid_results = client.retrieval.search(
2 query="What was Uber's profit in 2020?",
3 search_settings={
4 "use_hybrid_search": True,
5 "hybrid_settings": {
6 "full_text_weight": 1.0,
7 "semantic_weight": 5.0,
8 "full_text_limit": 200, # How many full-text results to initially consider
9 "rrf_k": 50, # Parameter for Reciprocal Rank Fusion
10 },
11 "filters": {"metadata.title": {"$in": ["uber_2021.pdf"]}}, # Filter by metadata field
12 "limit": 10 # Final number of results after fusion/ranking
13 },
14)

Advanced Filtering

Apply filters to narrow search results based on document properties or metadata. Supported operators include $eq, $neq, $gt, $gte, $lt, $lte, $like, $ilike, $in, $nin. You can combine filters using $and and $or.

1filtered_results = client.retrieval.search(
2 query="What are the effects of climate change?",
3 search_settings={
4 "filters": {
5 "$and":[
6 {"document_type": {"$eq": "pdf"}}, # Assuming 'document_type' is stored
7 {"metadata.year": {"$gt": 2020}} # Access nested metadata fields
8 ]
9 },
10 "limit": 10
11 }
12)

Distance metrics for vector search, which can be configured through the chunk_settings.index_measure parameter. Choosing the right distance measure can significantly impact search quality depending on your embeddings and use case:

  • cosine_distance (Default): Measures the cosine of the angle between vectors, ignoring magnitude. Best for comparing documents regardless of their length.
  • l2_distance (Euclidean): Measures the straight-line distance between vectors. Useful when both direction and magnitude matter.
  • max_inner_product: Optimized for finding vectors with similar direction. Good for recommendation systems.
  • l1_distance (Manhattan): Measures the sum of absolute differences. Less sensitive to outliers than L2.
  • hamming_distance: Counts the positions at which vectors differ. Best for binary embeddings.
  • jaccard_distance: Measures dissimilarity between sample sets. Useful for sparse embeddings.
1results = client.retrieval.search(
2 query="What are the key features of quantum computing?",
3 search_settings={
4 "chunk_settings": {
5 "index_measure": "l2_distance" # Use Euclidean distance instead of default
6 }
7 }
8)

For most text embedding models (e.g., OpenAI’s models), cosine_distance is recommended. For specialized embeddings or specific use cases, experiment with different measures to find the optimal setting for your data.

Knowledge Graph Enhanced Retrieval

Beyond searching through text chunks, R2R can leverage knowledge graphs to enrich the retrieval process. This offers several benefits:

  • Contextual Understanding: Knowledge graphs store information as entities (like people, organizations, concepts) and relationships (like “works for”, “is related to”, “is a type of”). Searching the graph allows R2R to find connections and context that might be missed by purely text-based search.
  • Relationship-Based Queries: Answer questions that rely on understanding connections, such as “What projects is Person X involved in?” or “How does Concept A relate to Concept B?”.
  • Discovering Structure: Graph search can reveal higher-level structures, such as communities of related entities or key connecting concepts within your data.
  • Complementary Results: Graph results (entities, relationships, community summaries) complement text chunks by providing structured information and broader context.

When knowledge graph search is active within R2R, the AggregateSearchResult returned by the Search or RAG endpoints may include relevant items in the graph_search_results list, enhancing the context available for understanding or generation.

Retrieval-Augmented Generation (RAG) (/retrieval/rag)

R2R’s RAG engine combines the search capabilities above (including text, vector, hybrid, and potentially graph results) with Large Language Models (LLMs) to generate contextually relevant responses grounded in your ingested documents and optional web search results.

RAG Configuration (rag_generation_config)

Control the LLM’s generation process:

  • model: Specify the LLM to use (e.g., "openai/gpt-4o-mini", "anthropic/claude-3-haiku-20240307"). Defaults are set in R2R config.
  • stream: Boolean (default false). Set to true for streaming responses.
  • temperature, max_tokens, top_p, etc.: Standard LLM generation parameters.

Basic RAG

Generate a response using retrieved context. Uses the same search_mode and search_settings as the search endpoint to find relevant information.

1# Basic RAG call using default search and generation settings
2rag_response = client.retrieval.rag(query="What is DeepSeek R1?")

Response Structure (WrappedRAGResponse):

The non-streaming RAG endpoint returns a WrappedRAGResponse containing an RAGResponse object with fields like:

  • results.generated_answer: The final synthesized answer from the LLM.
  • results.search_results: The AggregateSearchResult used to generate the answer (containing chunks, possibly graph results, and web results).
  • results.citations: A list of Citation objects linking parts of the answer to specific sources (ChunkSearchResult, GraphSearchResult, WebSearchResult, etc.) found in search_results. Each citation includes an id (short identifier used in the text like [1]) and a payload containing the source object.
  • results.metadata: LLM provider metadata about the generation call.
1// Simplified Example Structure
2{
3 "results": {
4 "generated_answer": "DeepSeek-R1 is a model that... [1]. It excels in tasks... [2].",
5 "search_results": {
6 "chunk_search_results": [ { "id": "chunk-abc...", "text": "...", "score": 0.8 }, /* ... */ ],
7 "graph_search_results": [ { /* Graph Entity/Relationship */ } ],
8 "web_search_results": [ { "url": "...", "title": "...", "snippet": "..." }, /* ... */ ]
9 },
10 "citations": [
11 {
12 "id": "cit.1", // Corresponds to [1] in text
13 "object": "citation",
14 "payload": { /* ChunkSearchResult for chunk-abc... */ }
15 },
16 {
17 "id": "cit.2", // Corresponds to [2] in text
18 "object": "citation",
19 "payload": { /* WebSearchResult for relevant web page */ }
20 }
21 // ... more citations potentially linking to graph results too
22 ],
23 "metadata": { "model": "openai/gpt-4o-mini", ... }
24 }
25}

RAG with Web Search Integration

Enhance RAG responses with up-to-date information from the web by setting include_web_search=True.

1web_rag_response = client.retrieval.rag(
2 query="What are the latest developments with DeepSeek R1?",
3 include_web_search=True
4)

When enabled, R2R performs a web search using the query, and the results are added to the context provided to the LLM alongside results from your documents or knowledge graph.

Combine hybrid search with RAG by configuring search_settings.

1hybrid_rag_response = client.retrieval.rag(
2 query="Who is Jon Snow?",
3 search_settings={"use_hybrid_search": True}
4)

Streaming RAG

Receive RAG responses as a stream of Server-Sent Events (SSE) by setting stream: True in rag_generation_config. This is ideal for real-time applications.

Event Types:

  1. search_results: Contains the initial AggregateSearchResult (sent once at the beginning).
    • data: The full AggregateSearchResult object (chunks, potentially graph results, web results).
  2. message: Streams partial tokens of the response as they are generated.
    • data.delta.content: The text chunk being streamed.
  3. citation: Indicates when a citation source is identified. Sent once per unique source when it’s first referenced.
    • data.id: The short citation ID (e.g., "cit.1").
    • data.payload: The full source object (ChunkSearchResult, GraphSearchResult, WebSearchResult, etc.).
    • data.is_new: True if this is the first time this citation ID is sent.
    • data.span: The start/end character indices in the current accumulated text where the citation marker (e.g., [1]) appears.
  4. final_answer: Sent once at the end, containing the complete generated answer and structured citations.
    • data.generated_answer: The full final text.
    • data.citations: List of all citations, including their id, payload, and all spans where they appeared in the final text.
1from r2r import (
2 CitationEvent,
3 FinalAnswerEvent,
4 MessageEvent,
5 SearchResultsEvent,
6 R2RClient,
7 # Assuming ThinkingEvent is imported if needed, though not standard in basic RAG
8)
9
10# Set stream=True in rag_generation_config
11result_stream = client.retrieval.rag(
12 query="What is DeepSeek R1?",
13 search_settings={"limit": 25},
14 rag_generation_config={"stream": True, "model": "openai/gpt-4o-mini"},
15 include_web_search=True,
16)
17
18for event in result_stream:
19 if isinstance(event, SearchResultsEvent):
20 print(f"Search results received (Chunks: {len(event.data.data.chunk_search_results)}, Graph: {len(event.data.data.graph_search_results)}, Web: {len(event.data.data.web_search_results)})")
21 elif isinstance(event, MessageEvent):
22 # Access the actual text delta
23 if event.data.delta and event.data.delta.content and event.data.delta.content[0].type == 'text' and event.data.delta.content[0].payload.value:
24 print(event.data.delta.content[0].payload.value, end="", flush=True)
25 elif isinstance(event, CitationEvent):
26 # Payload is only sent when is_new is True
27 if event.data.is_new:
28 print(f"\n<<< New Citation Source Detected: ID={event.data.id} >>>")
29
30 elif isinstance(event, FinalAnswerEvent):
31 print("\n\n--- Final Answer ---")
32 print(event.data.generated_answer)
33 print("\n--- Citations Summary ---")
34 for cit in event.data.citations:
35 print(f" ID: {cit.id}, Spans: {cit.span}")

Customizing RAG

Besides search_settings, you can customize RAG generation using rag_generation_config.

Example of customizing the model with web search:

1# Requires ANTHROPIC_API_KEY env var if using Anthropic models
2response = client.retrieval.rag(
3 query="Who was Aristotle and what are his recent influences?",
4 rag_generation_config={
5 "model":"anthropic/claude-3-haiku-20240307",
6 "stream": False, # Get a single response object
7 "temperature": 0.5
8 },
9 include_web_search=True
10)
11print(response.results.generated_answer)

Conclusion

R2R’s search and RAG capabilities provide flexible tools for finding and contextualizing information. Whether you need simple semantic search, advanced hybrid retrieval with filtering, or customizable RAG generation incorporating document chunks, knowledge graph insights, and web results via streaming or single responses, the system can be configured to meet your specific needs.

For more advanced use cases:

  • Explore advanced RAG strategies like HyDE and RAG-Fusion in Advanced RAG.
  • Learn about the conversational Agentic RAG system for multi-turn interactions.
  • Dive deeper into specific configuration options in the API & SDK Reference.