Hybrid Search
Learn how to implement and use hybrid search with R2R
Introduction
R2R’s hybrid search combines traditional keyword-based searching with modern semantic understanding, providing more accurate and contextually relevant results. This approach is particularly effective for complex queries where both specific terms and overall meaning are crucial.
How R2R Hybrid Search Works
- Full-Text Search: Utilizes PostgreSQL’s full-text search with
ts_rank_cd
andwebsearch_to_tsquery
. - Semantic Search: Performs vector similarity search using the query’s embedded representation.
- Reciprocal Rank Fusion (RRF): Merges results from full-text and semantic searches using the formula:
COALESCE(1.0 / (rrf_k + full_text.rank_ix), 0.0) * full_text_weight + COALESCE(1.0 / (rrf_k + semantic.rank_ix), 0.0) * semantic_weight
- Result Ranking: Orders final results based on the combined RRF score.
Key Features
Full-Text Search
The full-text search component incorporates:
- PostgreSQL’s
tsvector
for efficient text searching websearch_to_tsquery
for parsing user queriests_rank_cd
for ranking full-text search results
Semantic Search
The semantic search component uses:
- Vector embeddings for storing and querying semantic representations
- Cosine similarity for measuring the relevance of documents to the query
Configuration
VectorSearchSettings
Key settings for vector search configuration:
class VectorSearchSettings(BaseModel):
use_hybrid_search: bool
search_limit: int
filters: dict[str, Any]
hybrid_search_settings: Optional[HybridSearchSettings]
# ... other settings
HybridSearchSettings
Specific parameters for hybrid search:
class HybridSearchSettings(BaseModel):
full_text_weight: float
semantic_weight: float
full_text_limit: int
rrf_k: int
Usage Example
from r2r import R2RClient
client = R2RClient()
vector_settings = {
"use_hybrid_search": True,
"search_limit": 20,
# Can add logical filters, as shown:
# "filters":{"category": {"$eq": "technology"}},
"hybrid_search_settings" : {
"full_text_weight": 1.0,
"semantic_weight": 5.0,
"full_text_limit": 200,
"rrf_k": 50
}
}
results = client.search(
query="Who was Aristotle?",
vector_search_settings=vector_settings
)
print(results)
Results Comparison
Basic Vector Search
{
"results": {
"vector_search_results": [
{
"score": 0.780314067545999,
"text": "Aristotle[A] (Greek: Ἀριστοτέλης Aristotélēs, pronounced [aristotélɛːs]; 384–322 BC) was an Ancient Greek philosopher and polymath...",
"metadata": {
"title": "aristotle.txt",
"version": "v0",
"chunk_order": 0
}
}, ...
]
}
}
Hybrid Search with RRF
{
"results": {
"vector_search_results": [
{
"score": 0.0185185185185185,
"text": "Aristotle[A] (Greek: Ἀριστοτέλης Aristotélēs, pronounced [aristotélɛːs]; 384–322 BC) was an Ancient Greek philosopher and polymath...",
"metadata": {
"title": "aristotle.txt",
"version": "v0",
"chunk_order": 0,
"semantic_rank": 1,
"full_text_rank": 3
}, ...
}
]
}
}
Best Practices
- Optimize PostgreSQL indexing for both full-text and vector searches
- Regularly update search indices
- Monitor performance and adjust weights as needed
- Use appropriate vector dimensions and embedding models for your use case
Conclusion
R2R’s hybrid search offers a powerful solution for complex information retrieval needs, combining the strengths of keyword matching and semantic understanding. Its flexible configuration and use of Reciprocal Rank Fusion make it adaptable to a wide range of use cases, from technical documentation to broad, context-dependent queries.
Was this page helpful?