Hybrid Search
Learn how to implement and use hybrid search with R2R
Introduction
R2R’s hybrid search combines traditional keyword-based searching with modern semantic understanding, providing more accurate and contextually relevant results. This approach is particularly effective for complex queries where both specific terms and overall meaning are crucial.
How R2R Hybrid Search Works
- Full-Text Search: Utilizes PostgreSQL’s full-text search with
ts_rank_cd
andwebsearch_to_tsquery
. - Semantic Search: Performs vector similarity search using the query’s embedded representation.
- Reciprocal Rank Fusion (RRF): Merges results from full-text and semantic searches using the formula:
- Result Ranking: Orders final results based on the combined RRF score.
Key Features
Full-Text Search
The full-text search component incorporates:
- PostgreSQL’s
tsvector
for efficient text searching websearch_to_tsquery
for parsing user queriests_rank_cd
for ranking full-text search results
Semantic Search
The semantic search component uses:
- Vector embeddings for storing and querying semantic representations
- Cosine similarity for measuring the relevance of documents to the query
Configuration
VectorSearchSettings
Key settings for vector search configuration:
HybridSearchSettings
Specific parameters for hybrid search:
Usage Example
Results Comparison
Basic Vector Search
Hybrid Search with RRF
Best Practices
- Optimize PostgreSQL indexing for both full-text and vector searches
- Regularly update search indices
- Monitor performance and adjust weights as needed
- Use appropriate vector dimensions and embedding models for your use case
Conclusion
R2R’s hybrid search offers a powerful solution for complex information retrieval needs, combining the strengths of keyword matching and semantic understanding. Its flexible configuration and use of Reciprocal Rank Fusion make it adaptable to a wide range of use cases, from technical documentation to broad, context-dependent queries.