Embeddings in R2R

Introduction

R2R supports multiple Embedding providers, offering flexibility in choosing and switching between different models based on your specific requirements. This guide provides an in-depth look at configuring and using various Embedding providers within the R2R framework.

For a quick start on basic configuration, including embedding setup, please refer to our configuration guide.

Providers

R2R currently supports the following cloud embedding providers:

  • OpenAI
  • Azure
  • Cohere
  • HuggingFace
  • Bedrock (Amazon)
  • Vertex AI (Google)
  • Voyage AI

And for local inference:

  • Ollama
  • SentenceTransformers

Configuration

Update the embedding section in your r2r.toml file to configure your embedding provider. Here are some example configurations:

Selecting Different Embedding Providers

R2R supports a wide range of embedding providers through LiteLLM. Here’s how to configure and use them:

1export OPENAI_API_KEY=your_openai_key
2# Update r2r.toml:
3# "provider": "litellm" | "openai"
4# "base_model": "text-embedding-3-small"
5# "base_dimension": 512
6r2r serve --config-path=r2r.toml

Supported models include:

  • text-embedding-3-small
  • text-embedding-3-small
  • text-embedding-ada-002

Embedding Service Endpoints

The EmbeddingProvider is responsible for core functionalities in these R2R endpoints:

  1. update_files: When updating existing files in the system
  2. ingest_files: During the ingestion of new files
  3. search: For embedding search queries
  4. rag: As part of the Retrieval-Augmented Generation process

Here’s how you can use these endpoints with embeddings:

File Ingestion

1from r2r import R2R
2
3app = R2R()
4
5# Ingest a file, which will use the configured embedding model
6response = app.ingest_files(["path/to/your/file.txt"])
7print(f"Ingestion response: {response}")
1# Perform a search, which will embed the query using the configured model
2search_results = app.search("Your search query here")
3print(f"Search results: {search_results}")

RAG (Retrieval-Augmented Generation)

1# Use RAG, which involves embedding for retrieval
2rag_response = app.rag("Your question or prompt here")
3print(f"RAG response: {rag_response}")

Updating Files

1# Update existing files, which may involve re-embedding
2update_response = app.update_files(["path/to/updated/file.txt"])
3print(f"Update response: {update_response}")

Remember that you don’t directly call the embedding methods in your application code. R2R handles the embedding process internally based on your configuration.

Security Best Practices

  1. API Key Management: Use environment variables or secure key management solutions for API keys.
  2. Input Validation: Sanitize and validate all inputs before generating embeddings.
  3. Rate Limiting: Implement rate limiting to prevent abuse of embedding endpoints.
  4. Monitoring: Regularly monitor embedding usage for anomalies or misuse.

Custom Embedding Providers in R2R

You can create custom embedding providers by inheriting from the EmbeddingProvider class and implementing the required methods. This allows you to integrate any embedding model or service into R2R.

Embedding Provider Structure

The Embedding system in R2R is built on two main components:

  1. EmbeddingConfig: A configuration class for Embedding providers.
  2. EmbeddingProvider: An abstract base class that defines the interface for all Embedding providers.

EmbeddingConfig

The EmbeddingConfig class is used to configure Embedding providers:

1from r2r.base import ProviderConfig
2from typing import Optional
3
4class EmbeddingConfig(ProviderConfig):
5 provider: Optional[str] = None
6 base_model: Optional[str] = None
7 base_dimension: Optional[int] = None
8 rerank_model: Optional[str] = None
9 rerank_dimension: Optional[int] = None
10 rerank_transformer_type: Optional[str] = None
11 batch_size: int = 1
12 prefixes: Optional[dict[str, str]] = None
13
14 def validate(self) -> None:
15 if self.provider not in self.supported_providers:
16 raise ValueError(f"Provider '{self.provider}' is not supported.")
17
18 @property
19 def supported_providers(self) -> list[str]:
20 return [None, "openai", "ollama", "sentence-transformers"]

EmbeddingProvider

The EmbeddingProvider is an abstract base class that defines the common interface for all Embedding providers:

1from abc import abstractmethod
2from enum import Enum
3from r2r.base import Provider
4from r2r.abstractions.embedding import EmbeddingPurpose
5from r2r.abstractions.search import VectorSearchResult
6
7class EmbeddingProvider(Provider):
8 class PipeStage(Enum):
9 BASE = 1
10 RERANK = 2
11
12 def __init__(self, config: EmbeddingConfig):
13 if not isinstance(config, EmbeddingConfig):
14 raise ValueError("EmbeddingProvider must be initialized with a `EmbeddingConfig`.")
15 super().__init__(config)
16
17 @abstractmethod
18 def get_embedding(
19 self,
20 text: str,
21 stage: PipeStage = PipeStage.BASE,
22 purpose: EmbeddingPurpose = EmbeddingPurpose.INDEX,
23 ):
24 pass
25
26 @abstractmethod
27 def get_embeddings(
28 self,
29 texts: list[str],
30 stage: PipeStage = PipeStage.BASE,
31 purpose: EmbeddingPurpose = EmbeddingPurpose.INDEX,
32 ):
33 pass
34
35 @abstractmethod
36 def rerank(
37 self,
38 query: str,
39 results: list[VectorSearchResult],
40 stage: PipeStage = PipeStage.RERANK,
41 limit: int = 10,
42 ):
43 pass
44
45 @abstractmethod
46 def tokenize_string(
47 self, text: str, model: str, stage: PipeStage
48 ) -> list[int]:
49 pass
50
51 def set_prefixes(self, config_prefixes: dict[str, str], base_model: str):
52 # Implementation of prefix setting
53 pass

Creating a Custom Embedding Provider

To create a custom Embedding provider, follow these steps:

  1. Create a new class that inherits from EmbeddingProvider.
  2. Implement the required methods: get_embedding, get_embeddings, rerank, and tokenize_string.
  3. (Optional) Implement async versions of methods if needed.
  4. (Optional) Add any additional methods or attributes specific to your provider.

Here’s an example of a custom Embedding provider:

1import numpy as np
2from r2r.base import EmbeddingProvider, EmbeddingConfig
3from r2r.abstractions.embedding import EmbeddingPurpose
4from r2r.abstractions.search import VectorSearchResult
5
6class CustomEmbeddingProvider(EmbeddingProvider):
7 def __init__(self, config: EmbeddingConfig):
8 super().__init__(config)
9 # Initialize any custom attributes or models here
10 self.model = self._load_custom_model(config.base_model)
11
12 def _load_custom_model(self, model_name):
13 # Load your custom embedding model here
14 pass
15
16 def get_embedding(
17 self,
18 text: str,
19 stage: EmbeddingProvider.PipeStage = EmbeddingProvider.PipeStage.BASE,
20 purpose: EmbeddingPurpose = EmbeddingPurpose.INDEX,
21 ) -> list[float]:
22 # Apply prefix if available
23 if purpose in self.prefixes:
24 text = f"{self.prefixes[purpose]}{text}"
25
26 # Generate embedding using your custom model
27 embedding = self.model.encode(text)
28 return embedding.tolist()
29
30 def get_embeddings(
31 self,
32 texts: list[str],
33 stage: EmbeddingProvider.PipeStage = EmbeddingProvider.PipeStage.BASE,
34 purpose: EmbeddingPurpose = EmbeddingPurpose.INDEX,
35 ) -> list[list[float]]:
36 # Apply prefixes if available
37 if purpose in self.prefixes:
38 texts = [f"{self.prefixes[purpose]}{text}" for text in texts]
39
40 # Generate embeddings in batches
41 all_embeddings = []
42 for i in range(0, len(texts), self.config.batch_size):
43 batch = texts[i:i+self.config.batch_size]
44 batch_embeddings = self.model.encode(batch)
45 all_embeddings.extend(batch_embeddings.tolist())
46 return all_embeddings
47
48 def rerank(
49 self,
50 query: str,
51 results: list[VectorSearchResult],
52 stage: EmbeddingProvider.PipeStage = EmbeddingProvider.PipeStage.RERANK,
53 limit: int = 10,
54 ) -> list[VectorSearchResult]:
55 if not self.config.rerank_model:
56 return results[:limit]
57
58 # Implement custom reranking logic here
59 # This is a simple example using dot product similarity
60 query_embedding = self.get_embedding(query, stage, EmbeddingPurpose.QUERY)
61 for result in results:
62 result.score = np.dot(query_embedding, result.embedding)
63
64 reranked_results = sorted(results, key=lambda x: x.score, reverse=True)
65 return reranked_results[:limit]
66
67 def tokenize_string(
68 self, text: str, model: str, stage: EmbeddingProvider.PipeStage
69 ) -> list[int]:
70 # Implement custom tokenization logic
71 # This is a simple example using basic string splitting
72 return [ord(char) for word in text.split() for char in word]
73
74 # Optionally implement async versions of methods
75 async def async_get_embedding(self, text: str, stage: EmbeddingProvider.PipeStage, purpose: EmbeddingPurpose):
76 # Implement async version if needed
77 return self.get_embedding(text, stage, purpose)
78
79 async def async_get_embeddings(self, texts: list[str], stage: EmbeddingProvider.PipeStage, purpose: EmbeddingPurpose):
80 # Implement async version if needed
81 return self.get_embeddings(texts, stage, purpose)

Registering and Using the Custom Provider

To use your custom Embedding provider in R2R:

  1. Update the EmbeddingConfig class to include your custom provider:
1class EmbeddingConfig(ProviderConfig):
2 # ...existing code...
3
4 @property
5 def supported_providers(self) -> list[str]:
6 return [None, "openai", "ollama", "sentence-transformers", "custom"] # Add your custom provider here
  1. Update your R2R configuration to use the custom provider:
1[embedding]
2provider = "custom"
3base_model = "your-custom-model"
4base_dimension = 768
5batch_size = 32
6
7[embedding.prefixes]
8index = "Represent this document for retrieval: "
9query = "Represent this query for retrieving relevant documents: "
  1. In your R2R application, register the custom provider:
1from r2r import R2R
2from r2r.base import EmbeddingConfig
3from your_module import CustomEmbeddingProvider
4
5def get_embedding_provider(config: EmbeddingConfig):
6 if config.provider == "custom":
7 return CustomEmbeddingProvider(config)
8 # ... handle other providers ...
9
10r2r = R2R(embedding_provider_factory=get_embedding_provider)

Now you can use your custom Embedding provider seamlessly within your R2R application:

1# Ingest documents (embeddings will be generated using your custom provider)
2r2r.ingest_files(["path/to/document.txt"])
3
4# Perform a search
5results = r2r.search("Your search query")
6
7# Use RAG
8rag_response = r2r.rag("Your question here")

By following this structure, you can integrate any embedding model or service into R2R, maintaining consistency with the existing system while adding custom functionality as needed. This approach allows for great flexibility in choosing or implementing embedding solutions that best fit your specific use case.### Embedding Prefixes

Embedding Prefixes

R2R supports embedding prefixes to enhance embedding quality for different purposes:

  1. Index Prefixes: Applied to documents during indexing.
  2. Query Prefixes: Applied to search queries.

Configure prefixes in your r2r.toml or when initializing the EmbeddingConfig.

Troubleshooting

Common issues and solutions:

  1. API Key Errors: Ensure your API keys are correctly set and have the necessary permissions.
  2. Dimension Mismatch: Verify that the base_dimension in your config matches the actual output of the chosen model.
  3. Out of Memory Errors: Adjust the batch size or choose a smaller model if encountering memory issues with local models.

Performance Considerations

  1. Batching: Use batching for multiple, similar requests to improve throughput.
  2. Model Selection: Balance between model capability and inference speed based on your use case.
  3. Caching: Implement caching strategies to avoid re-embedding identical text.

Conclusion

R2R’s Embedding system provides a flexible and powerful foundation for integrating various embedding models into your applications. By understanding the available providers, configuration options, and best practices, you can effectively leverage embeddings to enhance your R2R-based projects.

For an advanced example of implementing reranking in R2R.