Language Models (LLMs)

Configure and use multiple Language Model providers in R2R

Introduction

R2R’s LLMProvider supports multiple third-party Language Model (LLM) providers, offering flexibility in choosing and switching between different models based on your specific requirements. This guide provides an in-depth look at configuring and using various LLM providers within the R2R framework.

Architecture Overview

R2R’s LLM system is built on a flexible provider model:

  1. LLM Provider: An abstract base class that defines the common interface for all LLM providers.
  2. Specific LLM Providers: Concrete implementations for different LLM services (e.g., OpenAI, LiteLLM).

These providers work in tandem to ensure flexible and efficient language model integration.

Providers

LiteLLM Provider (Default)

The default LiteLLMProvider offers a unified interface for multiple LLM services.

Key features:

  • Support for OpenAI, Anthropic, Vertex AI, HuggingFace, Azure OpenAI, Ollama, Together AI, and Openrouter
  • Consistent API across different LLM providers
  • Easy switching between models

OpenAI Provider

The OpenAILLM class provides direct integration with OpenAI’s models.

Key features:

  • Direct access to OpenAI’s API
  • Support for the latest OpenAI models
  • Fine-grained control over model parameters

Local Models

Support for running models locally using Ollama or other local inference engines, through LiteLLM.

Key features:

  • Privacy-preserving local inference
  • Customizable model selection
  • Reduced latency for certain use cases

Configuration

LLM Configuration

Update the completions section in your r2r.toml file:

[completions]
provider = "litellm"
[completions.generation_config]
model = "gpt-4"
temperature = 0.7
max_tokens = 150

The provided generation_config is used to establish the default generation parameters for your deployment. These settings can be overridden at runtime, offering flexibility in your application. You can adjust parameters:

  1. At the application level, by modifying the R2R configuration
  2. For individual requests, by passing custom parameters to the rag or get_completion methods
  3. Through API calls, by including specific parameters in your request payload

This allows you to fine-tune the behavior of your language model interactions on a per-use basis while maintaining a consistent baseline configuration.

Security Best Practices

  1. API Key Management: Use environment variables or secure key management solutions for API keys.
  2. Rate Limiting: Implement rate limiting to prevent abuse of LLM endpoints.
  3. Input Validation: Sanitize and validate all inputs before passing them to LLMs.
  4. Output Filtering: Implement content filtering for LLM outputs to prevent inappropriate content.
  5. Monitoring: Regularly monitor LLM usage and outputs for anomalies or misuse.

Custom LLM Providers in R2R

LLM Provider Structure

The LLM system in R2R is built on two main components:

  1. LLMConfig: A configuration class for LLM providers.
  2. LLMProvider: An abstract base class that defines the interface for all LLM providers.

LLMConfig

The LLMConfig class is used to configure LLM providers:

1from r2r.base import ProviderConfig
2from r2r.base.abstractions.llm import GenerationConfig
3from typing import Optional
4
5class LLMConfig(ProviderConfig):
6 provider: Optional[str] = None
7 generation_config: Optional[GenerationConfig] = None
8
9 def validate(self) -> None:
10 if not self.provider:
11 raise ValueError("Provider must be set.")
12 if self.provider and self.provider not in self.supported_providers:
13 raise ValueError(f"Provider '{self.provider}' is not supported.")
14
15 @property
16 def supported_providers(self) -> list[str]:
17 return ["litellm", "openai"]

LLMProvider

The LLMProvider is an abstract base class that defines the common interface for all LLM providers:

1from abc import abstractmethod
2from r2r.base import Provider
3from r2r.base.abstractions.llm import GenerationConfig, LLMChatCompletion, LLMChatCompletionChunk
4
5class LLMProvider(Provider):
6 def __init__(self, config: LLMConfig) -> None:
7 if not isinstance(config, LLMConfig):
8 raise ValueError("LLMProvider must be initialized with a `LLMConfig`.")
9 super().__init__(config)
10
11 @abstractmethod
12 def get_completion(
13 self,
14 messages: list[dict],
15 generation_config: GenerationConfig,
16 **kwargs,
17 ) -> LLMChatCompletion:
18 pass
19
20 @abstractmethod
21 def get_completion_stream(
22 self,
23 messages: list[dict],
24 generation_config: GenerationConfig,
25 **kwargs,
26 ) -> LLMChatCompletionChunk:
27 pass

Creating a Custom LLM Provider

To create a custom LLM provider, follow these steps:

  1. Create a new class that inherits from LLMProvider.
  2. Implement the required methods: get_completion and get_completion_stream.
  3. (Optional) Add any additional methods or attributes specific to your provider.

Here’s an example of a custom LLM provider:

1import logging
2from typing import Generator
3from r2r.base import LLMProvider, LLMConfig, LLMChatCompletion, LLMChatCompletionChunk
4from r2r.base.abstractions.llm import GenerationConfig
5
6logger = logging.getLogger(__name__)
7
8class CustomLLMProvider(LLMProvider):
9 def __init__(self, config: LLMConfig) -> None:
10 super().__init__(config)
11 # Initialize any custom attributes or connections here
12 self.custom_client = self._initialize_custom_client()
13
14 def _initialize_custom_client(self):
15 # Initialize your custom LLM client here
16 pass
17
18 def get_completion(
19 self,
20 messages: list[dict],
21 generation_config: GenerationConfig,
22 **kwargs,
23 ) -> LLMChatCompletion:
24 # Implement the logic to get a completion from your custom LLM
25 response = self.custom_client.generate(messages, **generation_config.dict(), **kwargs)
26
27 # Convert the response to LLMChatCompletion format
28 return LLMChatCompletion(
29 id=response.id,
30 choices=[
31 {
32 "message": {
33 "role": "assistant",
34 "content": response.text
35 },
36 "finish_reason": response.finish_reason
37 }
38 ],
39 usage={
40 "prompt_tokens": response.usage.prompt_tokens,
41 "completion_tokens": response.usage.completion_tokens,
42 "total_tokens": response.usage.total_tokens
43 }
44 )
45
46 def get_completion_stream(
47 self,
48 messages: list[dict],
49 generation_config: GenerationConfig,
50 **kwargs,
51 ) -> Generator[LLMChatCompletionChunk, None, None]:
52 # Implement the logic to get a streaming completion from your custom LLM
53 stream = self.custom_client.generate_stream(messages, **generation_config.dict(), **kwargs)
54
55 for chunk in stream:
56 yield LLMChatCompletionChunk(
57 id=chunk.id,
58 choices=[
59 {
60 "delta": {
61 "role": "assistant",
62 "content": chunk.text
63 },
64 "finish_reason": chunk.finish_reason
65 }
66 ]
67 )
68
69 # Add any additional methods specific to your custom provider
70 def custom_method(self, *args, **kwargs):
71 # Implement custom functionality
72 pass

Registering and Using the Custom Provider

To use your custom LLM provider in R2R:

  1. Update the LLMConfig class to include your custom provider:
1class LLMConfig(ProviderConfig):
2 # ...existing code...
3
4 @property
5 def supported_providers(self) -> list[str]:
6 return ["litellm", "openai", "custom"] # Add your custom provider here
  1. Update your R2R configuration to use the custom provider:
1[completions]
2provider = "custom"
3
4[completions.generation_config]
5model = "your-custom-model"
6temperature = 0.7
7max_tokens = 150
  1. In your R2R application, register the custom provider:
1from r2r import R2R
2from r2r.base import LLMConfig
3from your_module import CustomLLMProvider
4
5def get_llm_provider(config: LLMConfig):
6 if config.provider == "custom":
7 return CustomLLMProvider(config)
8 # ... handle other providers ...
9
10r2r = R2R(llm_provider_factory=get_llm_provider)

Now you can use your custom LLM provider seamlessly within your R2R application:

1messages = [
2 {"role": "system", "content": "You are a helpful assistant."},
3 {"role": "user", "content": "What is the capital of France?"}
4]
5
6response = r2r.get_completion(messages)
7print(response.choices[0].message.content)

By following this structure, you can integrate any LLM or service into R2R, maintaining consistency with the existing system while adding custom functionality as needed.

Prompt Engineering

R2R supports advanced prompt engineering techniques:

  1. Template Management: Create and manage reusable prompt templates.
  2. Dynamic Prompts: Generate prompts dynamically based on context or user input.
  3. Few-shot Learning: Incorporate examples in your prompts for better results.

Troubleshooting

Common issues and solutions:

  1. API Key Errors: Ensure your API keys are correctly set and have the necessary permissions.
  2. Rate Limiting: Implement exponential backoff for retries on rate limit errors.
  3. Context Length Errors: Be mindful of the maximum context length for your chosen model.
  4. Model Availability: Ensure the requested model is available and properly configured.

Performance Considerations

  1. Batching: Use batching for multiple, similar requests to improve throughput.
  2. Streaming: Utilize streaming for long-form content generation to improve perceived latency.
  3. Model Selection: Balance between model capability and inference speed based on your use case.

Server Configuration

The R2RConfig class handles the configuration of various components, including LLMs. Here’s a simplified version:

1class R2RConfig:
2 REQUIRED_KEYS: dict[str, list] = {
3 # ... other keys ...
4 "completions": ["provider"],
5 # ... other keys ...
6 }
7
8 def __init__(self, config_data: dict[str, Any]):
9 # Load and validate configuration
10 # ...
11
12 # Set LLM configuration
13 self.completions = LLMConfig.create(**self.completions)
14
15 # Override GenerationConfig defaults
16 GenerationConfig.set_default(**self.completions.get("generation_config", {}))
17
18 # ... other initialization ...

This configuration system allows for flexible setup of LLM providers and their default parameters.

Conclusion

R2R’s LLM system provides a flexible and powerful foundation for integrating various language models into your applications. By understanding the available providers, configuration options, and best practices, you can effectively leverage LLMs to enhance your R2R-based projects.

For further customization and advanced use cases, refer to the R2R API Documentation and configuration guide.