Embedding

Configure your embedding system

Embedding System

R2R uses embeddings as the foundation for semantic search and similarity matching capabilities. The embedding system is responsible for converting text into high-dimensional vectors that capture semantic meaning, enabling powerful search and retrieval operations.

R2R uses LiteLLM as to route embeddings requests because of their provider flexibility. Read more about LiteLLM here.

Embedding Configuration

The embedding system can be customized through the embedding section in your r2r.toml file, along with corresponding environment variables for sensitive information:

r2r.toml
1[embedding]
2provider = "litellm" # defaults to "litellm"
3base_model = "openai/text-embedding-3-small" # defaults to "openai/text-embedding-3-large"
4base_dimension = 512 # defaults to 3072
5batch_size = 512 # defaults to 128
6rerank_model = "BAAI/bge-reranker-v2-m3" # defaults to None
7concurrent_request_limit = 256 # defaults to 256

Relevant environment variables to the above configuration would be OPENAI_API_KEY, OPENAI_API_BASE, HUGGINGFACE_API_KEY, and HUGGINGFACE_API_BASE.

Advanced Embedding Features in R2R

R2R leverages several advanced embedding features to provide robust text processing and retrieval capabilities:

Batched Processing

R2R implements intelligent batching for embedding operations to optimize throughput and, in some cases, cost:

1class EmbeddingProvider:
2 async def embed_texts(self, texts: List[str]) -> List[List[float]]:
3 batches = [texts[i:i + self.batch_size] for i in range(0, len(texts), self.batch_size)]
4 embeddings = []
5 for batch in batches:
6 batch_embeddings = await self._process_batch(batch)
7 embeddings.extend(batch_embeddings)
8 return embeddings

Concurrent Request Management

The system implements sophisticated request handling with rate limiting and concurrency control:

  1. Rate Limiting: Prevents API throttling through intelligent request scheduling
  2. Concurrent Processing: Manages multiple embedding requests efficiently
  3. Error Handling: Implements retry logic with exponential backoff

Performance Considerations

When configuring embeddings in R2R, consider these optimization strategies:

  1. Batch Size Optimization:

    • Larger batch sizes improve throughput but increase latency
    • Consider provider-specific rate limits when setting batch size
    • Balance memory usage with processing speed
  2. Concurrent Requests:

    • Adjust concurrent_request_limit based on provider capabilities
    • Monitor API usage and adjust limits accordingly
    • Consider implementing local caching for frequently embedded texts
  3. Model Selection:

    • Balance embedding dimension size with accuracy requirements
    • Consider cost per token for different providers
    • Evaluate multilingual requirements when choosing models
  4. Resource Management:

    • Monitor memory usage with large batch sizes
    • Implement appropriate error handling and retry strategies
    • Consider implementing local model fallbacks for critical systems

Supported LiteLLM Providers

Select from the toggleable providers below.

Example configuration:

example r2r.toml
1provider = "litellm"
2base_model = "openai/text-embedding-3-small"
3base_dimension = 512
$export OPENAI_API_KEY=your_openai_key
># .. set other environment variables
>
>r2r serve --config-path=r2r.toml

Supported models include:

  • openai/text-embedding-3-small
  • openai/text-embedding-3-large
  • openai/text-embedding-ada-002