This guide extends the R2R Quickstart by demonstrating how to run R2R with a local large language model (LLM). We’ll walk through setting up a complete local Retrieval-Augmented Generation (RAG) system.

To run R2R with local RAG, ollama must be installed

pip install 'r2r'

# Postgres + pgvector is the default vector db.
export POSTGRES_USER=your_user
export POSTGRES_PASSWORD=your_password
export POSTGRES_HOST=your_host
export POSTGRES_PORT=your_port
export POSTGRES_DBNAME=your_db

R2R supports Ollama by default for local LLM inference. Install Ollama by following the instructions on their official website or GitHub README.


R2R uses a config.json file for settings. For local setup, we’ll use the default local_ollama configuration. This can be customized to your needs by setting up a standalone project.

Running Queries on the Local LLM + Embeddings

First, start the Ollama server with the necessary dependencies:

ollama pull llama2
ollama pull mxbai-embed-large
ollama serve

Run client commands in a new terminal after starting the Ollama server.

Ingesting and Embedding Documents

To ingest sample documents (excluding media files):

python -m r2r.examples.quickstart ingest_as_files --no-media=true --config_name=local_ollama

This command processes the documents, splits them into chunks, embeds the chunks, and stores them into your specified Postgres database. Relational data is also stored to allow for downstream document management, which you can read about in the quickstart.

To search the knowledge base:

python -m r2r.examples.quickstart search \
  --query="What contributions did Aristotle make to biology?" \

To perform RAG over the knowledge base:

python -m r2r.examples.quickstart rag \
  --query="What contributions did Aristotle make to biology?" \
  --config_name=local_ollama \
  --rag_generation_config='{"model": "ollama/llama2"}'

This command embeds the query, finds relevant chunks, and generates a response using the local LLM.

Customizing Your RAG Pipeline

R2R offers flexibility in customizing various aspects of the RAG pipeline:

For more details on configuration options, see the R2R Configuration Documentation.

Running local server

You may run a server locally by executing the command below, or by following the instructions in the Docker tab in the installation section.

python -m r2r.examples.quickstart serve --config_name=local_ollama
r2r.core.providers.vector_db_provider - INFO - Initializing VectorDBProvider with config extra_fields={} provider='pgvector' collection_name='demo_vecs'. - 2024-06-22 20:00:50,640
r2r.core.providers.embedding_provider - INFO - Initializing EmbeddingProvider with config extra_fields={'text_splitter': {'type': 'recursive_character', 'chunk_size': 512, 'chunk_overlap': 20}} provider='ollama' base_model='mxbai-embed-large' base_dimension=1024 rerank_model=None rerank_dimension=None rerank_transformer_type=None batch_size=32. - 2024-06-22 20:00:52,491
r2r.core.providers.llm_provider - INFO - Initializing LLM provider with config: extra_fields={} provider='litellm' - 2024-06-22 20:00:53,159


In this guide, we’ve covered:

  1. Installing R2R for local RAG
  2. Configuring the R2R pipeline
  3. Ingesting and embedding documents
  4. Running queries on a local LLM

This is just the beginning of what you can build with R2R. Experiment with your own documents, customize the pipeline, and explore R2R’s capabilities further.

For more information and support, visit the R2R GitHub repository or join the R2R Discord community.