Introduction

This guide extends the R2R Quickstart by demonstrating how R2R supports multiple large language models (LLMs). Multi-LLM support in R2R allows for a diverse and comprehensive approach to search and retrieval tasks.

LLMs are selected at runtime for maximum flexibility and ease of use.

Setup

This guide assumes R2R is already installed and the basic quickstart has been completed.

Using Different LLM Providers

Sample Commands

If you haven’t completed the quickstart or if your target database is empty, start by ingesting sample files:

# export OPENAI_API_KEY=...
python -m r2r.examples.quickstart ingest_files --no-media=true

Now we are ready to test RAG with different LLM providers and/or models.

To use Anthropic’s models, set the ANTHROPIC_API_KEY as an environment variable and specify the model:

By default, an OPENAI_API_KEY is still required for embeddings

# export OPENAI_API_KEY=...
# export ANTHROPIC_API_KEY=...
python -m r2r.examples.quickstart rag --query="Who was Aristotle?" \
--rag_generation_config='{"model":"claude-3-haiku-20240307"}'

Example output:

{
  'results': [
    {
      'id': 'chatcmpl-5bb806e4-f6c2-40c1-a9d4-c2447c8e906d',
      'choices': [
        {
          'message': {
            'content': 'Based on the context provided, Aristotle was:\n\n1. An Ancient Greek philosopher and polymath [1] who lived from 384-322 BC [3].\n2. He was the founder of the Peripatetic school of philosophy in the Lyceum in Athens...',
            'role': 'assistant'
          }
        }
      ],
      'model': 'claude-3-haiku-20240307',
      'usage': {
        'completion_tokens': 305,
        'prompt_tokens': 1403,
        'total_tokens': 1708
      }
    }
  ]
}

Sample Code

The selected LLM in the commands above is propagated to the R2R rag method as part of the GenerationConfig supplied via the rag_generation_config argument. A simplified example of this logic can be seen here:

from r2r import VectorSearchSettings, GenerationConfig

vector_search_settings = VectorSearchSettings(
    search_filters={"user_id": user1_id},
    ...
)

rag_generation_config = GenerationConfig(
  model="claude-3-haiku-20240307", 
  temperature=0.2, 
  ...
)

rag_results = app.rag(
    query="Explain AI briefly",
    vector_search_settings=vector_search_settings,
    rag_generation_config=rag_generation_config
)

Refer to the LLM Deep Dive for more information on how R2R supports different LLM providers.

Summary

This guide demonstrates R2R’s flexibility in using multiple LLMs. By leveraging different models from providers like OpenAI, Anthropic, and local options like Ollama, you have full control over how to serve user responses. This allows you to optimize for performance, cost, or specific use case requirements in your RAG applications.

For detailed setup and basic functionality, refer back to the R2R Quickstart. For more advanced usage and customization options, join the R2R Discord community.