Introduction

R2R’s LLMProvider supports multiple third-party Language Model (LLM) providers, offering flexibility in choosing and switching between different models based on your specific requirements. This guide provides an in-depth look at configuring and using various LLM providers within the R2R framework.

Architecture Overview

R2R’s LLM system is built on a flexible provider model:

  1. LLM Provider: An abstract base class that defines the common interface for all LLM providers.
  2. Specific LLM Providers: Concrete implementations for different LLM services (e.g., OpenAI, LiteLLM).

These providers work in tandem to ensure flexible and efficient language model integration.

Providers

LiteLLM Provider (Default)

The default LiteLLMProvider offers a unified interface for multiple LLM services.

Key features:

  • Support for OpenAI, Anthropic, Vertex AI, HuggingFace, Azure OpenAI, Ollama, Together AI, and Openrouter
  • Consistent API across different LLM providers
  • Easy switching between models

OpenAI Provider

The OpenAILLM class provides direct integration with OpenAI’s models.

Key features:

  • Direct access to OpenAI’s API
  • Support for the latest OpenAI models
  • Fine-grained control over model parameters

Local Models

Support for running models locally using Ollama or other local inference engines, through LiteLLM.

Key features:

  • Privacy-preserving local inference
  • Customizable model selection
  • Reduced latency for certain use cases

Configuration

LLM Configuration

Update the completions section in your r2r.toml file:

[completions]
provider = "litellm"

[completions.generation_config]
model = "gpt-4"
temperature = 0.7
max_tokens = 150

The provided generation_config is used to establish the default generation parameters for your deployment. These settings can be overridden at runtime, offering flexibility in your application. You can adjust parameters:

  1. At the application level, by modifying the R2R configuration
  2. For individual requests, by passing custom parameters to the rag or get_completion methods
  3. Through API calls, by including specific parameters in your request payload

This allows you to fine-tune the behavior of your language model interactions on a per-use basis while maintaining a consistent baseline configuration.

Security Best Practices

  1. API Key Management: Use environment variables or secure key management solutions for API keys.
  2. Rate Limiting: Implement rate limiting to prevent abuse of LLM endpoints.
  3. Input Validation: Sanitize and validate all inputs before passing them to LLMs.
  4. Output Filtering: Implement content filtering for LLM outputs to prevent inappropriate content.
  5. Monitoring: Regularly monitor LLM usage and outputs for anomalies or misuse.

Custom LLM Providers in R2R

LLM Provider Structure

The LLM system in R2R is built on two main components:

  1. LLMConfig: A configuration class for LLM providers.
  2. LLMProvider: An abstract base class that defines the interface for all LLM providers.

LLMConfig

The LLMConfig class is used to configure LLM providers:

from r2r.base import ProviderConfig
from r2r.base.abstractions.llm import GenerationConfig
from typing import Optional

class LLMConfig(ProviderConfig):
    provider: Optional[str] = None
    generation_config: Optional[GenerationConfig] = None

    def validate(self) -> None:
        if not self.provider:
            raise ValueError("Provider must be set.")
        if self.provider and self.provider not in self.supported_providers:
            raise ValueError(f"Provider '{self.provider}' is not supported.")

    @property
    def supported_providers(self) -> list[str]:
        return ["litellm", "openai"]

LLMProvider

The LLMProvider is an abstract base class that defines the common interface for all LLM providers:

from abc import abstractmethod
from r2r.base import Provider
from r2r.base.abstractions.llm import GenerationConfig, LLMChatCompletion, LLMChatCompletionChunk

class LLMProvider(Provider):
    def __init__(self, config: LLMConfig) -> None:
        if not isinstance(config, LLMConfig):
            raise ValueError("LLMProvider must be initialized with a `LLMConfig`.")
        super().__init__(config)

    @abstractmethod
    def get_completion(
        self,
        messages: list[dict],
        generation_config: GenerationConfig,
        **kwargs,
    ) -> LLMChatCompletion:
        pass

    @abstractmethod
    def get_completion_stream(
        self,
        messages: list[dict],
        generation_config: GenerationConfig,
        **kwargs,
    ) -> LLMChatCompletionChunk:
        pass

Creating a Custom LLM Provider

To create a custom LLM provider, follow these steps:

  1. Create a new class that inherits from LLMProvider.
  2. Implement the required methods: get_completion and get_completion_stream.
  3. (Optional) Add any additional methods or attributes specific to your provider.

Here’s an example of a custom LLM provider:

import logging
from typing import Generator
from r2r.base import LLMProvider, LLMConfig, LLMChatCompletion, LLMChatCompletionChunk
from r2r.base.abstractions.llm import GenerationConfig

logger = logging.getLogger(__name__)

class CustomLLMProvider(LLMProvider):
    def __init__(self, config: LLMConfig) -> None:
        super().__init__(config)
        # Initialize any custom attributes or connections here
        self.custom_client = self._initialize_custom_client()

    def _initialize_custom_client(self):
        # Initialize your custom LLM client here
        pass

    def get_completion(
        self,
        messages: list[dict],
        generation_config: GenerationConfig,
        **kwargs,
    ) -> LLMChatCompletion:
        # Implement the logic to get a completion from your custom LLM
        response = self.custom_client.generate(messages, **generation_config.dict(), **kwargs)

        # Convert the response to LLMChatCompletion format
        return LLMChatCompletion(
            id=response.id,
            choices=[
                {
                    "message": {
                        "role": "assistant",
                        "content": response.text
                    },
                    "finish_reason": response.finish_reason
                }
            ],
            usage={
                "prompt_tokens": response.usage.prompt_tokens,
                "completion_tokens": response.usage.completion_tokens,
                "total_tokens": response.usage.total_tokens
            }
        )

    def get_completion_stream(
        self,
        messages: list[dict],
        generation_config: GenerationConfig,
        **kwargs,
    ) -> Generator[LLMChatCompletionChunk, None, None]:
        # Implement the logic to get a streaming completion from your custom LLM
        stream = self.custom_client.generate_stream(messages, **generation_config.dict(), **kwargs)

        for chunk in stream:
            yield LLMChatCompletionChunk(
                id=chunk.id,
                choices=[
                    {
                        "delta": {
                            "role": "assistant",
                            "content": chunk.text
                        },
                        "finish_reason": chunk.finish_reason
                    }
                ]
            )

    # Add any additional methods specific to your custom provider
    def custom_method(self, *args, **kwargs):
        # Implement custom functionality
        pass

Registering and Using the Custom Provider

To use your custom LLM provider in R2R:

  1. Update the LLMConfig class to include your custom provider:
class LLMConfig(ProviderConfig):
    # ...existing code...

    @property
    def supported_providers(self) -> list[str]:
        return ["litellm", "openai", "custom"]  # Add your custom provider here
  1. Update your R2R configuration to use the custom provider:
[completions]
provider = "custom"

[completions.generation_config]
model = "your-custom-model"
temperature = 0.7
max_tokens = 150
  1. In your R2R application, register the custom provider:
from r2r import R2R
from r2r.base import LLMConfig
from your_module import CustomLLMProvider

def get_llm_provider(config: LLMConfig):
    if config.provider == "custom":
        return CustomLLMProvider(config)
    # ... handle other providers ...

r2r = R2R(llm_provider_factory=get_llm_provider)

Now you can use your custom LLM provider seamlessly within your R2R application:

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the capital of France?"}
]

response = r2r.get_completion(messages)
print(response.choices[0].message.content)

By following this structure, you can integrate any LLM or service into R2R, maintaining consistency with the existing system while adding custom functionality as needed.

Prompt Engineering

R2R supports advanced prompt engineering techniques:

  1. Template Management: Create and manage reusable prompt templates.
  2. Dynamic Prompts: Generate prompts dynamically based on context or user input.
  3. Few-shot Learning: Incorporate examples in your prompts for better results.

Troubleshooting

Common issues and solutions:

  1. API Key Errors: Ensure your API keys are correctly set and have the necessary permissions.
  2. Rate Limiting: Implement exponential backoff for retries on rate limit errors.
  3. Context Length Errors: Be mindful of the maximum context length for your chosen model.
  4. Model Availability: Ensure the requested model is available and properly configured.

Performance Considerations

  1. Batching: Use batching for multiple, similar requests to improve throughput.
  2. Streaming: Utilize streaming for long-form content generation to improve perceived latency.
  3. Model Selection: Balance between model capability and inference speed based on your use case.

Server Configuration

The R2RConfig class handles the configuration of various components, including LLMs. Here’s a simplified version:

class R2RConfig:
    REQUIRED_KEYS: dict[str, list] = {
        # ... other keys ...
        "completions": ["provider"],
        # ... other keys ...
    }

    def __init__(self, config_data: dict[str, Any]):
        # Load and validate configuration
        # ...

        # Set LLM configuration
        self.completions = LLMConfig.create(**self.completions)

        # Override GenerationConfig defaults
        GenerationConfig.set_default(**self.completions.get("generation_config", {}))

        # ... other initialization ...

This configuration system allows for flexible setup of LLM providers and their default parameters.

Conclusion

R2R’s LLM system provides a flexible and powerful foundation for integrating various language models into your applications. By understanding the available providers, configuration options, and best practices, you can effectively leverage LLMs to enhance your R2R-based projects.

For further customization and advanced use cases, refer to the R2R API Documentation and configuration guide.