Embedding — The most advanced AI retrieval system. Containerized, Retrieval-Augmented Generation (RAG) with a RESTful API.

Embedding System

R2R uses embeddings as the foundation for semantic search and similarity matching capabilities. The embedding system is responsible for converting text into high-dimensional vectors that capture semantic meaning, enabling powerful search and retrieval operations.

R2R uses LiteLLM as to route embeddings requests because of their provider flexibility. Read more about LiteLLM here.

Embedding Configuration

The embedding system can be customized through the embedding section in your r2r.toml file, along with corresponding environment variables for sensitive information:

r2r.toml

1 [embedding]
2 provider = "litellm" # defaults to "litellm"
3 base_model = "openai/text-embedding-3-small" # defaults to "openai/text-embedding-3-large"
4 base_dimension = 512 # defaults to 3072
5 batch_size = 512 # defaults to 128
6 rerank_model = "BAAI/bge-reranker-v2-m3" # defaults to None
7 concurrent_request_limit = 256 # defaults to 256

Relevant environment variables to the above configuration would be OPENAI_API_KEY, OPENAI_API_BASE, HUGGINGFACE_API_KEY, and HUGGINGFACE_API_BASE.

Advanced Embedding Features in R2R

R2R leverages several advanced embedding features to provide robust text processing and retrieval capabilities:

Batched Processing

R2R implements intelligent batching for embedding operations to optimize throughput and, in some cases, cost:

1 class EmbeddingProvider:
2     async def embed_texts(self, texts: List[str]) -> List[List[float]]:
3         batches = [texts[i:i + self.batch_size] for i in range(0, len(texts), self.batch_size)]
4         embeddings = []
5         for batch in batches:
6             batch_embeddings = await self._process_batch(batch)
7             embeddings.extend(batch_embeddings)
8         return embeddings

Concurrent Request Management

The system implements sophisticated request handling with rate limiting and concurrency control:

Rate Limiting: Prevents API throttling through intelligent request scheduling
Concurrent Processing: Manages multiple embedding requests efficiently
Error Handling: Implements retry logic with exponential backoff

Performance Considerations

When configuring embeddings in R2R, consider these optimization strategies:

Batch Size Optimization:
- Larger batch sizes improve throughput but increase latency
- Consider provider-specific rate limits when setting batch size
- Balance memory usage with processing speed
Concurrent Requests:
- Adjust concurrent_request_limit based on provider capabilities
- Monitor API usage and adjust limits accordingly
- Consider implementing local caching for frequently embedded texts
Model Selection:
- Balance embedding dimension size with accuracy requirements
- Consider cost per token for different providers
- Evaluate multilingual requirements when choosing models
Resource Management:
- Monitor memory usage with large batch sizes
- Implement appropriate error handling and retry strategies
- Consider implementing local model fallbacks for critical systems

Supported LiteLLM Providers

Select from the toggleable providers below.

OpenAI

Azure

Anthropic

Cohere

Ollama

HuggingFace

Bedrock

Vertex AI

Voyage AI

Example configuration:

example r2r.toml

1 provider = "litellm"
2 base_model = "openai/text-embedding-3-small"
3 base_dimension = 512

$ export OPENAI_API_KEY=your_openai_key
> # .. set other environment variables
> 
> r2r serve --config-path=r2r.toml

Supported models include:

openai/text-embedding-3-small
openai/text-embedding-3-large
openai/text-embedding-ada-002

1	[embedding]
2	provider = "litellm" # defaults to "litellm"
3	base_model = "openai/text-embedding-3-small" # defaults to "openai/text-embedding-3-large"
4	base_dimension = 512 # defaults to 3072
5	batch_size = 512 # defaults to 128
6	rerank_model = "BAAI/bge-reranker-v2-m3" # defaults to None
7	concurrent_request_limit = 256 # defaults to 256

1	class EmbeddingProvider:
2	async def embed_texts(self, texts: List[str]) -> List[List[float]]:
3	batches = [texts[i:i + self.batch_size] for i in range(0, len(texts), self.batch_size)]
4	embeddings = []
5	for batch in batches:
6	batch_embeddings = await self._process_batch(batch)
7	embeddings.extend(batch_embeddings)
8	return embeddings

1	provider = "litellm"
2	base_model = "openai/text-embedding-3-small"
3	base_dimension = 512

$	export OPENAI_API_KEY=your_openai_key
>	# .. set other environment variables
>
>	r2r serve --config-path=r2r.toml