Local LLMs

Overview

There are many amazing LLMs and embedding models that can be run locally. R2R fully supports using these models, giving you full control over your data and infrastructure.

Running models locally can be ideal for sensitive data handling, reducing API costs, or situations where internet connectivity is limited. While cloud-based LLMs often provide cutting-edge performance, local models offer a compelling balance of capability, privacy, and cost-effectiveness for many use cases.

Local LLM features are currently restricted to:

Self-deployed instances
Enterprise tier cloud accounts

Contact our sales team for Enterprise pricing and features.

Serving Local Models

For this cookbook, we’ll serve our local models via Ollama. You may follow the instructions on their official website to install.

R2R supports LiteLLM for routing embedding and completion requests. This allows for OpenAI-compatible endpoints to be called and seamlessly routed to, if you are serving local models another way.

We must first download the models that we wish to run and start our ollama server. The following command will ‘pull’ the models and begin the Ollama server via http://localhost:11434.

Bash

1 ollama pull llama3.1
2 ollama pull mxbai-embed-large
3 ollama serve

Configuring R2R

Now that our models have been loaded and our Ollama server is ready, we can launch our R2R server.

The standard distribution of R2R includes a configuration file for running llama3.1 and mxbai-embed-large. If you wish to utilize other models, you must create a custom config file and pass this to your server.

local_llm.toml

We launch R2R by specifying this configuration file:

1 r2r serve --docker --config-name=local_llm

Since we’re serving with Docker, once R2R successfully launches the R2R dashboard opens for us. We can upload a document and see requests hit our Ollama server.

The processed document and the Ollama server logs. — The R2R Dashboard and Ollama server showing successful ingestion

Retrieval and Search

Now that we have ingested our file, we can perform RAG and chunk search over it. Here, we see that we are able to get relevant results and correct answers—all without needing to make a request out to an external provider!

Overview

Serving Local Models

Bash

Configuring R2R

local_llm.toml

Retrieval and Search

Local RAG

Local Search

1	ollama pull llama3.1
2	ollama pull mxbai-embed-large
3	ollama serve