Local RAG System

Introduction

This guide extends the R2R Quickstart by demonstrating how to run R2R with a local Large Language Model (LLM). We’ll walk through setting up a complete local Retrieval-Augmented Generation (RAG) system.

Installation

To run local RAG with R2R you must either install Docker with ollama by using the or install ollama by appending the --docker-ext-ollama flag in the command line, or ollama must be installed on your local system.

To install locally, you may follow the instructions on their official website or GitHub README.

Docker allows users to get started with R2R seamlessly—providing R2R, the R2R Dashboard, and a Postgres+pgvector database all in one place.

Run the following command to start all containers:

# `r2r docker-down` to bring down existing R2R Docker, if running.
r2r --config-name=local_ollama serve --docker --docker-ext-ollama

Performance may be worse when running ollama inside of Docker, please review your Docker Desktop settings when running on a local computer

The R2R docker-compose includes a pre-configured container for Postgres+pgvector. Instead, you may override this behavior by specifying Postgres environment variable before calling docker-compose as shown:

export POSTGRES_USER=$YOUR_POSTGRES_USER
export POSTGRES_PASSWORD=$YOUR_POSTGRES_PASSWORD
export POSTGRES_HOST=$YOUR_POSTGRES_HOST
export POSTGRES_PORT=$YOUR_POSTGRES_PORT
export POSTGRES_DBNAME=$YOUR_POSTGRES_DBNAME
export POSTGRES_VECS_COLLECTION=$MY_VECS_COLLECTION # see note below
docker-compose up -d

The POSTGRES_VECS_COLLECTION environment variable defines the collection within your Postgres database where R2R related tables reside. If the specified collection does not exist then it will be created by R2R during initialization.

Starting the Ollama server

Next, make sure that your Ollama server is online with the necessary dependencies:

# Check the name of the ollama container and modify the command if it differs from r2r-ollama-1
docker exec -it r2r-ollama-1 ollama pull llama3.1
docker exec -it r2r-ollama-1 ollama pull mxbai-embed-large

Configuration

R2R uses a r2r.json file for settings. For local setup, we’ll use the default local_ollama configuration. This can be customized to your needs by setting up a standalone project.

The local_ollama configuration file (r2r/examples/configs/local_ollama.json) includes:

{  
  "completions": {
    "provider": "litellm",
    "generation_config": {
      "model": "ollama/llama3.1",
      "temperature": 0.1,
      "top_p": 1.0,
      "top_k": 100,
      "max_tokens_to_sample": 1024,
      "stream": false,
      "functions": null,
      "skip_special_tokens": false,
      "stop_token": null,
      "num_beams": 1,
      "do_sample": true,
      "generate_with_chat": false,
      "add_generation_kwargs": {},
      "api_base": null
    }
  },  
  "embedding": {
    "provider": "ollama",
    "base_model": "mxbai-embed-large",
    "base_dimension": 1024,
    "batch_size": 32
  },
  "ingestion":{
    "excluded_parsers": {
      "gif": "default",
      "jpeg": "default",
      "jpg": "default",
      "png": "default",
      "svg": "default",
      "mp3": "default",
      "mp4": "default"
    }
  },
  "vector_database": {
    "provider": "pgvector"
  }
}

This configuration uses ollama and the model mxbai-embed-large to run embeddings. We have excluded media file parsers. More advanced settings, like reranking, can be applied by modifying the configuration.

We are still working on adding local multimodal RAG features, your feedback would be appreciated

For more information on how to configure R2R, visit here.

Starting the Ollama server

Next, make sure that your Ollama server is online with the necessary dependencies:

# Check the name of the ollama container and modify the command if it differs from r2r-ollama-1
docker exec -it r2r-ollama-1 ollama pull llama3.1
docker exec -it r2r-ollama-1 ollama pull mxbai-embed-large

Starting the R2R server

# The Docker installation launches an R2R API over port `8000`.

Run the client commands below in a new terminal after starting the Ollama server.

Interacting with R2R

Ingest

To ingest sample documents (excluding media files):

r2r ingest-sample-file

This command processes the ingested, splits them into chunks, embeds the chunks, and stores them into your specified Postgres database. Relational data is also stored to allow for downstream document management, which you can read about in the quickstart.

Search

To search the knowledge base:

r2r search --query="Who was Aristotle?"

RAG

To perform RAG over the knowledge base:

# customize LLM model with a flag like `--rag-model="ollama/llama3.1"`
r2r rag --query="Who was Aristotle?"

Streaming

To perform streaming RAG over the knowledge base:

r2r rag --query="What contributions did Aristotle make to biology?" --stream

This command is the same as the one called for a basic RAG completion, except the result is streamed in real time.

Summary

In this guide, we’ve covered:

Installing R2R for local RAG
Configuring the R2R pipeline
Ingesting and embedding documents
Running search and RAG on a local LLM

This is just the beginning of what you can build with R2R. Experiment with your own documents, customize the pipeline, and explore R2R’s capabilities further.

For detailed setup and basic functionality, refer back to the R2R Quickstart. For more advanced usage and customization options, join the R2R Discord community

Get Started

RAG Cookbooks

Auth & Admin Cookbooks

Deep Dives

Introduction

Installation

Starting the Ollama server

Configuration

Starting the Ollama server

Starting the R2R server

Interacting with R2R

Ingest

Search

RAG

Streaming

Summary

Get Started

RAG Cookbooks

Auth & Admin Cookbooks

Deep Dives

​Introduction

​Installation

Starting the Ollama server

​Configuration

Starting the Ollama server

​Starting the R2R server

​Interacting with R2R

​Ingest

​Search

​RAG

​Streaming

​Summary

Introduction

Installation

Configuration

Starting the R2R server

Interacting with R2R

Ingest

Search

RAG

Streaming

Summary