Retrieval Configuration

Configure your retrieval system

Introduction

Search in R2R combines vector-based semantic search and knowledge graph querying to provide powerful information retrieval capabilities. The system leverages both semantic similarity and relationship-based context to deliver accurate and contextually relevant results.

R2R’s search capabilities are built on Postgres, which provides:

  • Vector similarity search through the pgvector extension
  • Full-text search using ts_rank and websearch_to_tsquery
  • Efficient indexing with HNSW and IVF-Flat methods
  • Flexible metadata filtering using JSONB
  • Feature-complete user and document management

This integrated approach ensures high performance and reliability while simplifying the overall architecture.

Server-Side Configuration

The base configuration for search capabilities is defined in your r2r.toml file:

1[database]
2provider = "postgres"
3batch_size = 256
4
5[embedding]
6provider = "litellm"
7base_model = "openai/text-embedding-3-small"
8base_dimension = 512
9batch_size = 128
10concurrent_request_limit = 256
11rerank_model = "huggingface/BAAI/bge-reranker-v2-m3" # default is None
12rerank_url = "https://huggingface.co/..." # use a valid API url

These settings directly impact how R2R performs search operations, as embeddings are used during semantic search. When a reranking model is specified, it becomes the default model used at runtime. See the embedding configuration for detailed parameter information.

Vector Search Configuration

Vector search can be configured both through server-side settings and runtime parameters:

1chunk_settings = {
2 "use_semantic_search": True,
3 "filters": {"document_type": {"$eq": "article"}},
4 "limit": 20,
5 "use_hybrid_search": True,
6 "selected_collection_ids": ["c3291abf-8a4e-5d9d-80fd-232ef6fd8526"]
7}

For hybrid search, additional weights can be specified:

1hybrid_settings = {
2 "full_text_weight": 1.0,
3 "semantic_weight": 5.0
4}

See the Search API Reference for complete parameter details.

Knowledge Graph Search Configuration

Knowledge graph search provides relationship-aware search capabilities:

1graph_settings = {
2 "enabled": True,
3 "entity_types": ["Person", "Organization"],
4 "relationships": ["worksFor", "foundedBy"]
5}

See the Knowledge Graph API Reference for complete parameter details.

Pipeline Architecture

Usage Examples

1from r2r import R2RClient
2
3client = R2RClient()
4
5response = client.retrieval.search(
6 "query",
7 search_settings={
8 "chunk_settings": chunk_settings,
9 "graph_settings": graph_settings
10 }
11)

Advanced Filtering

1filters = {
2 "$and": [
3 {"publication_date": {"$gte": "2023-01-01"}},
4 {"author": {"$in": ["John Doe", "Jane Smith"]}}
5 ]
6}
7
8search_settings["filters"] = filters
9
10response = client.retrieval.search("query", search_settings=search_settings)
Built with