Embedding
Configure your embedding system
Embedding System
R2R uses embeddings as the foundation for semantic search and similarity matching capabilities. The embedding system is responsible for converting text into high-dimensional vectors that capture semantic meaning, enabling powerful search and retrieval operations.
R2R uses LiteLLM as to route embeddings requests because of their provider flexibility. Read more about LiteLLM here.
Embedding Configuration
The embedding system can be customized through the embedding
section in your r2r.toml
file, along with corresponding environment variables for sensitive information:
Relevant environment variables to the above configuration would be OPENAI_API_KEY
, OPENAI_API_BASE
, HUGGINGFACE_API_KEY
, and HUGGINGFACE_API_BASE
.
Advanced Embedding Features in R2R
R2R leverages several advanced embedding features to provide robust text processing and retrieval capabilities:
Batched Processing
R2R implements intelligent batching for embedding operations to optimize throughput and, in some cases, cost:
Concurrent Request Management
The system implements sophisticated request handling with rate limiting and concurrency control:
- Rate Limiting: Prevents API throttling through intelligent request scheduling
- Concurrent Processing: Manages multiple embedding requests efficiently
- Error Handling: Implements retry logic with exponential backoff
Performance Considerations
When configuring embeddings in R2R, consider these optimization strategies:
-
Batch Size Optimization:
- Larger batch sizes improve throughput but increase latency
- Consider provider-specific rate limits when setting batch size
- Balance memory usage with processing speed
-
Concurrent Requests:
- Adjust
concurrent_request_limit
based on provider capabilities - Monitor API usage and adjust limits accordingly
- Consider implementing local caching for frequently embedded texts
- Adjust
-
Model Selection:
- Balance embedding dimension size with accuracy requirements
- Consider cost per token for different providers
- Evaluate multilingual requirements when choosing models
-
Resource Management:
- Monitor memory usage with large batch sizes
- Implement appropriate error handling and retry strategies
- Consider implementing local model fallbacks for critical systems
Supported LiteLLM Providers
OpenAI
Azure
Anthropic
Cohere
Ollama
HuggingFace
Bedrock
Vertex AI
Voyage AI
Example configuration:
Supported models include:
- openai/text-embedding-3-small
- openai/text-embedding-3-large
- openai/text-embedding-ada-002