Introduction

This guide extends the R2R Quickstart by demonstrating how to run R2R with a local Large Language Model (LLM). We’ll walk through setting up a complete local Retrieval-Augmented Generation (RAG) system.

You may skip the commands below if you have already completed the quickstart.
pip install 'r2r'

# Postgres + pgvector is the default vector db.
export POSTGRES_USER=$YOUR_POSTGRES_USER
export POSTGRES_PASSWORD=$YOUR_POSTGRES_PASSWORD
export POSTGRES_HOST=$YOUR_POSTGRES_HOST
export POSTGRES_PORT=$YOUR_POSTGRES_PORT
export POSTGRES_DBNAME=$YOUR_POSTGRES_DBNAME
export POSTGRES_VECS_COLLECTION=demo_vecs

R2R uses ollama by default for local LLM inference. Install Ollama by following the instructions on their official website or GitHub README.

Configuration

R2R uses a config.json file for settings. For local setup, we’ll use the default local_ollama configuration. This can be customized to your needs by setting up a standalone project.

Running Queries on the Local LLM + Embeddings

First, start the Ollama server with the necessary dependencies:

ollama pull llama2
ollama pull mxbai-embed-large
ollama serve

Run the client commands below in a new terminal after starting the Ollama server.

Ingesting and Embedding Documents

To ingest sample documents (excluding media files):

python -m r2r.examples.quickstart ingest_files --no-media=true --config_name=local_ollama

This command processes the documents, splits them into chunks, embeds the chunks, and stores them into your specified Postgres database. Relational data is also stored to allow for downstream document management, which you can read about in the quickstart.

To search the knowledge base:

python -m r2r.examples.quickstart search \
  --query="What contributions did Aristotle make to biology?" \
  --config_name=local_ollama

To perform RAG over the knowledge base:

python -m r2r.examples.quickstart rag \
  --query="What contributions did Aristotle make to biology?" \
  --config_name=local_ollama \
  --rag_generation_config='{"model": "ollama/llama2"}'

This command embeds the query, finds relevant chunks, and generates a response using the local LLM.

Customizing Your RAG Pipeline

R2R offers flexibility in customizing various aspects of the RAG pipeline:

For more details on configuration options, see the R2R Configuration Documentation.

Running local server

You may run a server locally by executing the command below, or by following the instructions in the Docker tab in the installation section.

python -m r2r.examples.quickstart serve --config_name=local_ollama
r2r.core.providers.vector_db_provider - INFO - Initializing VectorDBProvider with config extra_fields={} provider='pgvector' collection_name='demo_vecs'. - 2024-06-22 20:00:50,640
r2r.core.providers.embedding_provider - INFO - Initializing EmbeddingProvider with config extra_fields={'text_splitter': {'type': 'recursive_character', 'chunk_size': 512, 'chunk_overlap': 20}} provider='ollama' base_model='mxbai-embed-large' base_dimension=1024 rerank_model=None rerank_dimension=None rerank_transformer_type=None batch_size=32. - 2024-06-22 20:00:52,491
r2r.core.providers.llm_provider - INFO - Initializing LLM provider with config: extra_fields={} provider='litellm' - 2024-06-22 20:00:53,159

Summary

In this guide, we’ve covered:

  1. Installing R2R for local RAG
  2. Configuring the R2R pipeline
  3. Ingesting and embedding documents
  4. Running queries on a local LLM

This is just the beginning of what you can build with R2R. Experiment with your own documents, customize the pipeline, and explore R2R’s capabilities further.

For detailed setup and basic functionality, refer back to the R2R Quickstart. For more advanced usage and customization options, join the R2R Discord community.