LLMs
Configure your LLM provider
Language Model System
R2R uses Large Language Models (LLMs) as the core reasoning engine for RAG operations, providing sophisticated text generation and analysis capabilities.
R2R uses LiteLLM as to route LLM requests because of their provider flexibility. Read more about LiteLLM here.
LLM Configuration
The LLM system can be customized through the completion
section in your r2r.toml
file:
Relevant environment variables to the above configuration would be OPENAI_API_KEY
, ANTHROPIC_API_KEY
, AZURE_API_KEY
, etc. depending on your chosen provider.
Advanced LLM Features in R2R
R2R leverages several advanced LLM features to provide robust text generation:
Concurrent Request Management
The system implements sophisticated request handling with rate limiting and concurrency control:
- Rate Limiting: Prevents API throttling through intelligent request scheduling
- Concurrent Processing: Manages multiple LLM requests efficiently
- Error Handling: Implements retry logic with exponential backoff
Performance Considerations
When configuring LLMs in R2R, consider these optimization strategies:
-
Concurrency Management:
- Adjust
concurrent_request_limit
based on provider limits - Monitor API usage and adjust accordingly
- Consider implementing request caching for repeated queries
- Adjust
-
Model Selection:
- Balance model capabilities with latency requirements
- Consider cost per token for different providers
- Evaluate context window requirements
-
Resource Management:
- Monitor token usage with large responses
- Implement appropriate error handling and retry strategies
- Consider implementing fallback models for critical systems
Serving select LLM providers
OpenAI
Azure
Anthropic
Vertex AI
AWS Bedrock
Groq
Ollama
Cohere
Anyscale
Supported models include:
- openai/gpt-4o
- openai/gpt-4-turbo
- openai/gpt-4
- openai/gpt-4o-mini
For a complete list of supported OpenAI models and detailed usage instructions, please refer to the LiteLLM OpenAI documentation.
Runtime Configuration of LLM Provider
R2R supports runtime configuration of the LLM provider, allowing you to dynamically change the model or provider for each request. This flexibility enables you to use different models or providers based on specific requirements or use cases.
Combining Search and Generation
When performing a RAG query, you can dynamically set the LLM generation settings:
For more detailed information on configuring other search and RAG settings, please refer to the RAG Configuration documentation.