Local LLMs
Learn how to run a Retrieval-Augmented Generation system locally using R2R
Introduction
To run R2R with default local LLM settings, execute r2r serve --docker --config-name=local_llm
.
R2R supports RAG with local LLMs through the Ollama library. You may follow the instructions on their official website to install Ollama outside of the R2R Docker. To include Ollama inside the R2R Docker, pass the selection --exclude-ollama=False
in the command shown previously.
For MacBooks with M1 or newer processors, we recommend setting --exclude-ollama
flag to True and installing Ollama outside of Docker.
This approach is recommended because Docker doesn’t fully support hardware acceleration on Apple Silicon, which can limit performance.
Preparing Local LLMs
Next, make sure that you have all the necessary LLMs installed:
# in a separate terminal
ollama pull llama3.1
ollama pull mxbai-embed-large
ollama serve
# when running ollama inisde Docker, instead:
# docker exec -it r2r-ollama-1 ollama pull llama3.1
# docker exec -it r2r-ollama-1 ollama pull mxbai-embed-large
These commands will need to be replaced with models specific to your configuration when deploying R2R with a customized configuration.
Configuration
R2R uses a TOML configuration file for managing settings, which you can read about here. For local setup, we’ll use the default local_llm
configuration. This can be customized to your needs by setting up a standalone project.
For more information on how to configure R2R, visit here.
Summary
The above steps are all you need to get RAG up and running with local LLMs in R2R. For detailed setup and basic functionality, refer back to the [R2R Quickstart]((/documentation/quickstart/introduction). For more advanced usage and customization options, refer to the [basic configuration]((/documentation/configuration/introduction) or join the R2R Discord community.
Was this page helpful?