Local RAG System
Learn how to set up and run a Retrieval-Augmented Generation system locally using R2R
Introduction
This guide extends the R2R Quickstart by demonstrating how to run R2R with a local Large Language Model (LLM). We’ll walk through setting up a complete local Retrieval-Augmented Generation (RAG) system.
Installation
To run local RAG with R2R you must either install Docker with ollama by using the or install ollama
by appending the --docker-ext-ollama
flag in the command line, or ollama must be installed on your local system.
To install locally, you may follow the instructions on their official website or GitHub README.
Docker allows users to get started with R2R seamlessly—providing R2R, the R2R Dashboard, and a Postgres+pgvector database all in one place.
Run the following command to start all containers:
# `r2r docker-down` to bring down existing R2R Docker, if running.
r2r --config-name=local_ollama serve --docker --docker-ext-ollama
The R2R docker-compose includes a pre-configured container for Postgres+pgvector. Instead, you may override this behavior by specifying Postgres environment variable before calling docker-compose as shown:
export POSTGRES_USER=$YOUR_POSTGRES_USER
export POSTGRES_PASSWORD=$YOUR_POSTGRES_PASSWORD
export POSTGRES_HOST=$YOUR_POSTGRES_HOST
export POSTGRES_PORT=$YOUR_POSTGRES_PORT
export POSTGRES_DBNAME=$YOUR_POSTGRES_DBNAME
export POSTGRES_VECS_COLLECTION=$MY_VECS_COLLECTION # see note below
docker-compose up -d
The POSTGRES_VECS_COLLECTION
environment variable defines the collection within your Postgres database where R2R related tables reside. If the specified collection does not exist then it will be created by R2R during initialization.
Starting the Ollama server
Next, make sure that your Ollama server is online with the necessary dependencies:
# Check the name of the ollama container and modify the command if it differs from r2r-ollama-1
docker exec -it r2r-ollama-1 ollama pull llama3.1
docker exec -it r2r-ollama-1 ollama pull mxbai-embed-large
Configuration
R2R uses a r2r.json
file for settings. For local setup, we’ll use the default local_ollama
configuration. This can be customized to your needs by setting up a standalone project.
Starting the R2R server
# The Docker installation launches an R2R API over port `8000`.
Run the client commands below in a new terminal after starting the Ollama server.
Interacting with R2R
Ingest
To ingest sample documents (excluding media files):
r2r ingest-sample-file
This command processes the ingested, splits them into chunks, embeds the chunks, and stores them into your specified Postgres database. Relational data is also stored to allow for downstream document management, which you can read about in the quickstart.
Search
To search the knowledge base:
r2r search --query="Who was Aristotle?"
RAG
To perform RAG over the knowledge base:
# customize LLM model with a flag like `--rag-model="ollama/llama3.1"`
r2r rag --query="Who was Aristotle?"
Streaming
To perform streaming RAG over the knowledge base:
r2r rag --query="What contributions did Aristotle make to biology?" --stream
This command is the same as the one called for a basic RAG completion, except the result is streamed in real time.
Summary
In this guide, we’ve covered:
- Installing R2R for local RAG
- Configuring the R2R pipeline
- Ingesting and embedding documents
- Running search and RAG on a local LLM
This is just the beginning of what you can build with R2R. Experiment with your own documents, customize the pipeline, and explore R2R’s capabilities further.
For detailed setup and basic functionality, refer back to the R2R Quickstart. For more advanced usage and customization options, join the R2R Discord community
Was this page helpful?