Search and RAG
R2R provides powerful search and retrieval capabilities through vector search, full-text search, hybrid search, and Retrieval-Augmented Generation (RAG). The system supports multiple search modes and extensive runtime configuration to help you find and contextualize information effectively.
Refer to the retrieval API and SDK reference for detailed retrieval examples.
Search Modes and Settings
When using the Search (/retrieval/search
) or RAG (/retrieval/rag
) endpoints, you control the retrieval process using search_mode
and search_settings
.
search_mode
(Optional, defaults tocustom
): Choose between pre-configured modes or full customization.basic
: Defaults to a simple semantic search configuration. Good for quick setup.advanced
: Defaults to a hybrid search configuration combining semantic and full-text. Offers broader results.custom
: Allows full control via thesearch_settings
object. Ifsearch_settings
are omitted incustom
mode, default vector search settings are applied.
search_settings
(Optional): A detailed configuration object. If provided alongsidebasic
oradvanced
modes, these settings will override the mode’s defaults. Key settings include:use_semantic_search
: Boolean to enable/disable vector-based semantic search (default:true
unless overridden).use_fulltext_search
: Boolean to enable/disable keyword-based full-text search (default:false
unless using hybrid).use_hybrid_search
: Boolean to enable hybrid search, combining semantic and full-text (default:false
). Requireshybrid_settings
.filters
: Apply complex filtering rules using MongoDB-like syntax (see “Advanced Filtering” below).limit
: Integer controlling the maximum number of results to return (default:10
).hybrid_settings
: Object to configure weights (semantic_weight
,full_text_weight
), limits (full_text_limit
), and fusion (rrf_k
) for hybrid search.chunk_settings
: Object to fine-tune vector index parameters likeindex_measure
(distance metric),probes
,ef_search
.search_strategy
: String to enable advanced RAG techniques like"hyde"
or"rag_fusion"
(default:"vanilla"
). See Advanced RAG.include_scores
: Boolean to include relevance scores in the results (default:true
).include_metadatas
: Boolean to include metadata in the results (default:true
).
AI Powered Search (/retrieval/search
)
R2R offers powerful and highly configurable search capabilities. This endpoint returns raw search results without LLM generation.
Basic Search Example
This performs a search using default configurations or a specified mode.
Python
JavaScript
Curl
Response Structure (WrappedSearchResponse
):
The search endpoint returns a WrappedSearchResponse
containing an AggregateSearchResult
object with fields like:
results.chunk_search_results
: A list of relevant textChunkSearchResult
objects found (containingid
,document_id
,text
,score
,metadata
).results.graph_search_results
: A list of relevantGraphSearchResult
objects (entities, relationships, communities) if graph search is active and finds results.results.web_search_results
: A list ofWebSearchResult
objects (if web search was somehow enabled, though typically done via RAG/Agent).
Hybrid Search Example
Combine keyword-based (full-text) search with vector search for potentially broader results.
Python
JavaScript
Curl
Advanced Filtering
Apply filters to narrow search results based on document properties or metadata. Supported operators include $eq
, $neq
, $gt
, $gte
, $lt
, $lte
, $like
, $ilike
, $in
, $nin
. You can combine filters using $and
and $or
.
Python
JavaScript
Distance Measures for Vector Search
Distance metrics for vector search, which can be configured through the chunk_settings.index_measure
parameter. Choosing the right distance measure can significantly impact search quality depending on your embeddings and use case:
cosine_distance
(Default): Measures the cosine of the angle between vectors, ignoring magnitude. Best for comparing documents regardless of their length.l2_distance
(Euclidean): Measures the straight-line distance between vectors. Useful when both direction and magnitude matter.max_inner_product
: Optimized for finding vectors with similar direction. Good for recommendation systems.l1_distance
(Manhattan): Measures the sum of absolute differences. Less sensitive to outliers than L2.hamming_distance
: Counts the positions at which vectors differ. Best for binary embeddings.jaccard_distance
: Measures dissimilarity between sample sets. Useful for sparse embeddings.
Python
For most text embedding models (e.g., OpenAI’s models), cosine_distance is recommended. For specialized embeddings or specific use cases, experiment with different measures to find the optimal setting for your data.
Knowledge Graph Enhanced Retrieval
Beyond searching through text chunks, R2R can leverage knowledge graphs to enrich the retrieval process. This offers several benefits:
- Contextual Understanding: Knowledge graphs store information as entities (like people, organizations, concepts) and relationships (like “works for”, “is related to”, “is a type of”). Searching the graph allows R2R to find connections and context that might be missed by purely text-based search.
- Relationship-Based Queries: Answer questions that rely on understanding connections, such as “What projects is Person X involved in?” or “How does Concept A relate to Concept B?”.
- Discovering Structure: Graph search can reveal higher-level structures, such as communities of related entities or key connecting concepts within your data.
- Complementary Results: Graph results (entities, relationships, community summaries) complement text chunks by providing structured information and broader context.
When knowledge graph search is active within R2R, the AggregateSearchResult
returned by the Search or RAG endpoints may include relevant items in the graph_search_results
list, enhancing the context available for understanding or generation.
Retrieval-Augmented Generation (RAG) (/retrieval/rag
)
R2R’s RAG engine combines the search capabilities above (including text, vector, hybrid, and potentially graph results) with Large Language Models (LLMs) to generate contextually relevant responses grounded in your ingested documents and optional web search results.
RAG Configuration (rag_generation_config
)
Control the LLM’s generation process:
model
: Specify the LLM to use (e.g.,"openai/gpt-4o-mini"
,"anthropic/claude-3-haiku-20240307"
). Defaults are set in R2R config.stream
: Boolean (defaultfalse
). Set totrue
for streaming responses.temperature
,max_tokens
,top_p
, etc.: Standard LLM generation parameters.
Basic RAG
Generate a response using retrieved context. Uses the same search_mode
and search_settings
as the search endpoint to find relevant information.
Python
JavaScript
Curl
Response Structure (WrappedRAGResponse
):
The non-streaming RAG endpoint returns a WrappedRAGResponse
containing an RAGResponse
object with fields like:
results.generated_answer
: The final synthesized answer from the LLM.results.search_results
: TheAggregateSearchResult
used to generate the answer (containing chunks, possibly graph results, and web results).results.citations
: A list ofCitation
objects linking parts of the answer to specific sources (ChunkSearchResult
,GraphSearchResult
,WebSearchResult
, etc.) found insearch_results
. Each citation includes anid
(short identifier used in the text like[1]
) and apayload
containing the source object.results.metadata
: LLM provider metadata about the generation call.
RAG with Web Search Integration
Enhance RAG responses with up-to-date information from the web by setting include_web_search=True
.
Python
JavaScript
Curl
When enabled, R2R performs a web search using the query, and the results are added to the context provided to the LLM alongside results from your documents or knowledge graph.
RAG with Hybrid Search
Combine hybrid search with RAG by configuring search_settings
.
Python
JavaScript
Curl
Streaming RAG
Receive RAG responses as a stream of Server-Sent Events (SSE) by setting stream: True
in rag_generation_config
. This is ideal for real-time applications.
Event Types:
search_results
: Contains the initialAggregateSearchResult
(sent once at the beginning).data
: The fullAggregateSearchResult
object (chunks, potentially graph results, web results).
message
: Streams partial tokens of the response as they are generated.data.delta.content
: The text chunk being streamed.
citation
: Indicates when a citation source is identified. Sent once per unique source when it’s first referenced.data.id
: The short citation ID (e.g.,"cit.1"
).data.payload
: The full source object (ChunkSearchResult
,GraphSearchResult
,WebSearchResult
, etc.).data.is_new
: True if this is the first time this citation ID is sent.data.span
: The start/end character indices in the current accumulated text where the citation marker (e.g.,[1]
) appears.
final_answer
: Sent once at the end, containing the complete generated answer and structured citations.data.generated_answer
: The full final text.data.citations
: List of all citations, including theirid
,payload
, and allspans
where they appeared in the final text.
Python
JavaScript
Customizing RAG
Besides search_settings
, you can customize RAG generation using rag_generation_config
.
Example of customizing the model with web search:
Python
JavaScript
Curl
Conclusion
R2R’s search and RAG capabilities provide flexible tools for finding and contextualizing information. Whether you need simple semantic search, advanced hybrid retrieval with filtering, or customizable RAG generation incorporating document chunks, knowledge graph insights, and web results via streaming or single responses, the system can be configured to meet your specific needs.
For more advanced use cases:
- Explore advanced RAG strategies like HyDE and RAG-Fusion in Advanced RAG.
- Learn about the conversational Agentic RAG system for multi-turn interactions.
- Dive deeper into specific configuration options in the API & SDK Reference.