LLMs
Learn how to configure LLMs in your R2R deployment
R2R uses language models to generate responses based on retrieved context. You can configure R2R’s server-side LLM generation settings with the r2r.toml
:
[completion]
provider = "litellm"
concurrent_request_limit = 16
[completion.generation_config]
model = "openai/gpt-4o"
temperature = 0.1
top_p = 1
max_tokens_to_sample = 1_024
stream = false
add_generation_kwargs = {}
Key generation configuration options:
provider
: The LLM provider (defaults to “LiteLLM” for maximum flexibility).concurrent_request_limit
: Maximum number of concurrent LLM requests.model
: The language model to use for generation.temperature
: Controls the randomness of the output (0.0 to 1.0).top_p
: Nucleus sampling parameter (0.0 to 1.0).max_tokens_to_sample
: Maximum number of tokens to generate.stream
: Enable/disable streaming of generated text.api_base
: The base URL for remote communication, e.g.https://api.openai.com/v1
Serving select LLM providers
export OPENAI_API_KEY=your_openai_key
# .. set other environment variables
# Optional - Update default model
# Set '"model": "openai/gpt-4o-mini"' in `r2r.toml`
# then call `r2r serve --config-path=r2r.toml`
r2r serve
Supported models include:
- openai/gpt-4o
- openai/gpt-4-turbo
- openai/gpt-4
- openai/gpt-4o-mini
For a complete list of supported OpenAI models and detailed usage instructions, please refer to the LiteLLM OpenAI documentation.
Runtime Configuration of LLM Provider
R2R supports runtime configuration of the LLM provider, allowing you to dynamically change the model or provider for each request. This flexibility enables you to use different models or providers based on specific requirements or use cases.
Combining Search and Generation
When performing a RAG query, you can dynamically set the LLM generation settings:
response = client.rag(
"What are the latest advancements in quantum computing?",
rag_generation_config={
"stream": False,
"model": "openai/gpt-4o-mini",
"temperature": 0.7,
"max_tokens": 150
}
)
For more detailed information on configuring other search and RAG settings, please refer to the RAG Configuration documentation.
Next Steps
For more detailed information on configuring specific components of R2R, please refer to the following pages: