Local RAG System
Learn how to set up and run a Retrieval-Augmented Generation system locally using R2R
Introduction
This guide extends the R2R Quickstart by demonstrating how to run R2R with a local Large Language Model (LLM). We’ll walk through setting up a complete local Retrieval-Augmented Generation (RAG) system.
pip install 'r2r'
# Postgres + pgvector is the default vector db.
export POSTGRES_USER=$YOUR_POSTGRES_USER
export POSTGRES_PASSWORD=$YOUR_POSTGRES_PASSWORD
export POSTGRES_HOST=$YOUR_POSTGRES_HOST
export POSTGRES_PORT=$YOUR_POSTGRES_PORT
export POSTGRES_DBNAME=$YOUR_POSTGRES_DBNAME
export POSTGRES_VECS_COLLECTION=demo_vecs
R2R uses ollama
by default for local LLM inference. Install Ollama by following the instructions on their official website or GitHub README.
Configuration
R2R uses a config.json
file for settings. For local setup, we’ll use the default local_ollama
configuration. This can be customized to your needs by setting up a standalone project.
Running Queries on the Local LLM + Embeddings
First, start the Ollama server with the necessary dependencies:
ollama pull llama2
ollama pull mxbai-embed-large
ollama serve
Run the client commands below in a new terminal after starting the Ollama server.
Ingesting and Embedding Documents
To ingest sample documents (excluding media files):
python -m r2r.examples.quickstart ingest_files --no-media=true --config_name=local_ollama
This command processes the documents, splits them into chunks, embeds the chunks, and stores them into your specified Postgres database. Relational data is also stored to allow for downstream document management, which you can read about in the quickstart.
To search the knowledge base:
python -m r2r.examples.quickstart search \
--query="What contributions did Aristotle make to biology?" \
--config_name=local_ollama
To perform RAG over the knowledge base:
python -m r2r.examples.quickstart rag \
--query="What contributions did Aristotle make to biology?" \
--config_name=local_ollama \
--rag_generation_config='{"model": "ollama/llama2"}'
This command embeds the query, finds relevant chunks, and generates a response using the local LLM.
Customizing Your RAG Pipeline
R2R offers flexibility in customizing various aspects of the RAG pipeline:
For more details on configuration options, see the R2R Configuration Documentation.
Running local server
You may run a server locally by executing the command below, or by following the instructions in the Docker tab in the installation section.
python -m r2r.examples.quickstart serve --config_name=local_ollama
r2r.core.providers.vector_db_provider - INFO - Initializing VectorDBProvider with config extra_fields={} provider='pgvector' collection_name='demo_vecs'. - 2024-06-22 20:00:50,640
r2r.core.providers.embedding_provider - INFO - Initializing EmbeddingProvider with config extra_fields={'text_splitter': {'type': 'recursive_character', 'chunk_size': 512, 'chunk_overlap': 20}} provider='ollama' base_model='mxbai-embed-large' base_dimension=1024 rerank_model=None rerank_dimension=None rerank_transformer_type=None batch_size=32. - 2024-06-22 20:00:52,491
r2r.core.providers.llm_provider - INFO - Initializing LLM provider with config: extra_fields={} provider='litellm' - 2024-06-22 20:00:53,159
Summary
In this guide, we’ve covered:
- Installing R2R for local RAG
- Configuring the R2R pipeline
- Ingesting and embedding documents
- Running queries on a local LLM
This is just the beginning of what you can build with R2R. Experiment with your own documents, customize the pipeline, and explore R2R’s capabilities further.
For detailed setup and basic functionality, refer back to the R2R Quickstart. For more advanced usage and customization options, join the R2R Discord community.
Was this page helpful?