Local LLMs
Run R2R with Local LLMs
Overview
There are many amazing LLMs and embedding models that can be run locally. R2R fully supports using these models, giving you full control over your data and infrastructure.
Running models locally can be ideal for sensitive data handling, reducing API costs, or situations where internet connectivity is limited. While cloud-based LLMs often provide cutting-edge performance, local models offer a compelling balance of capability, privacy, and cost-effectiveness for many use cases.
Local LLM features are currently restricted to:
- Self-deployed instances
- Enterprise tier cloud accounts
Contact our sales team for Enterprise pricing and features.
Serving Local Models
For this cookbook, we’ll serve our local models via Ollama. You may follow the instructions on their official website to install.
R2R supports LiteLLM for routing embedding and completion requests. This allows for OpenAI-compatible endpoints to be called and seamlessly routed to, if you are serving local models another way.
We must first download the models that we wish to run and start our ollama server. The following command will ‘pull’ the models and begin the Ollama server via http://localhost:11434
.
Bash
Configuring R2R
Now that our models have been loaded and our Ollama server is ready, we can launch our R2R server.
The standard distribution of R2R includes a configuration file for running llama3.1
and mxbai-embed-large
. If you wish to utilize other models, you must create a custom config file and pass this to your server.
We launch R2R by specifying this configuration file:
Since we’re serving with Docker, once R2R successfully launches the R2R dashboard opens for us. We can upload a document and see requests hit our Ollama server.