Getting Started
Configure Your Pipeline

Configuring Your RAG Pipeline

The R2R library provides flexibility in customizing various aspects of the RAG pipeline to suit your specific needs.

Your RAG pipeline providers can be configured through the config.json file. Below are some of the various options that are supported.

Vector Database Provider

R2R supports multiple vector database providers, including:

  • local: A local vector database implementation written in sqlite.
  • qdrant: Integration with Qdrant, a high-performance vector similarity search engine.
  • pgvector: Integration with PGVector, a vector similarity search extension for PostgreSQL.
  • sciphi: Managed PGVector database from SciPhi.

To specify the vector database provider, set the provider field under vector_database in the config.json file. Make sure to provide the necessary connection details and credentials for your chosen provider.

For more information, refer to vector database providers.

Embedding Provider

R2R supports OpenAI and local inference as embedding providers. To configure the embedding settings, update the embedding section in the config.json file. Specify the desired embedding model, dimension, and batch size according to your requirements. This can easily be extended by request.

  • openai: Integration with OpenAI, supporting models like text-embedding-3-small and text-embedding-3-large.
  • sentence-transformers: Integration with the sentence transformers library, providing support for models available on HuggingFace, like mixedbread-ai/mxbai-embed-large-v1.

For more information, refer to embedding providers.

Language Model Provider

  • openai: Integration with OpenAI, supporting models like gpt-3.5-turbo
  • litellm (default): Integrates with many LLM providers, such as those listed below
    • OpenAI
    • ollama
    • Anthropic
    • Vertex AI
    • HuggingFace
    • ...
  • llama-cpp: Integrates with the llama-cpp library for local inference.

For more information, refer to llm providers.

Evaluation Provider

R2R supports DeepEval and PareaAI as evaluation providers. These providers allow you to evaluate the performance and quality of your RAG pipeline at regular intervals.

  • provider: Specifies the evaluation provider to use (deepeval or pareaai).
  • sampling_fraction: Determines how often the pipeline should be evaluated. It represents the fraction of queries that should trigger an evaluation. For example, a sampling_fraction of 0.1 means that approximately 10% of the queries will be evaluated.

Logging Provider

The R2R library supports various logging providers to store execution logs of the RAG pipeline.

R2R supports the following logging providers:

  • postgres: Logs pipeline execution information to a PostgreSQL database.
  • local: Logs pipeline execution information to a local SQLite database.
  • redis: Logs pipeline execution information to a Redis database.