LLMs
Learn how to configure LLMs in your R2R deployment
R2R uses language models to generate responses based on retrieved context. You can configure R2R’s server-side LLM generation settings with the r2r.toml
:
Key generation configuration options:
provider
: The LLM provider (defaults to “LiteLLM” for maximum flexibility).concurrent_request_limit
: Maximum number of concurrent LLM requests.model
: The language model to use for generation.temperature
: Controls the randomness of the output (0.0 to 1.0).top_p
: Nucleus sampling parameter (0.0 to 1.0).max_tokens_to_sample
: Maximum number of tokens to generate.stream
: Enable/disable streaming of generated text.api_base
: The base URL for remote communication, e.g.https://api.openai.com/v1
Serving select LLM providers
OpenAI
Azure
Anthropic
Vertex AI
AWS Bedrock
Groq
Ollama
Cohere
Anyscale
Supported models include:
- openai/gpt-4o
- openai/gpt-4-turbo
- openai/gpt-4
- openai/gpt-4o-mini
For a complete list of supported OpenAI models and detailed usage instructions, please refer to the LiteLLM OpenAI documentation.
Runtime Configuration of LLM Provider
R2R supports runtime configuration of the LLM provider, allowing you to dynamically change the model or provider for each request. This flexibility enables you to use different models or providers based on specific requirements or use cases.
Combining Search and Generation
When performing a RAG query, you can dynamically set the LLM generation settings:
For more detailed information on configuring other search and RAG settings, please refer to the RAG Configuration documentation.
Next Steps
For more detailed information on configuring specific components of R2R, please refer to the following pages: