Introduction

R2RConfig uses a TOML-based configuration system to customize various aspects of R2R’s functionality. This guide provides a detailed overview of how to configure R2R, including all available options and their meanings.

Configuration File Structure

The R2R configuration is stored in a TOML file, which defaults to r2r.toml. The file is divided into several sections, each corresponding to a different aspect of the R2R system:

  • Authentication
  • Completion (LLM)
  • Cryptography
  • Database
  • Embedding
  • Evaluation
  • Ingestion
  • Knowledge Graph
  • Logging
  • Prompt Management

Loading a Configuration

To use a custom configuration, you can load it when initializing R2R:

from r2r import R2RConfig, R2RBuilder

# Load a custom configuration
config = R2RConfig.from_toml("path/to/your/r2r.toml")
r2r = R2RBuilder(config).build()

# Or use a preset configuration
r2r = R2RBuilder(config_name="default").build()

Configuration Sections

Authentication

Refer to the AuthProvider to learn more about how R2R supports auth providers.

[auth]
provider = "r2r"
access_token_lifetime_in_minutes = 60
refresh_token_lifetime_in_days = 7
require_authentication = false
require_email_verification = false
default_admin_email = "[email protected]"
default_admin_password = "change_me_immediately"
  • provider: Authentication provider. Currently, only “r2r” is supported.
  • access_token_lifetime_in_minutes: Lifespan of access tokens in minutes.
  • refresh_token_lifetime_in_days: Lifespan of refresh tokens in days.
  • require_authentication: If true, all secure routes require authentication. Otherwise, non-authenticated requests mock superuser access.
  • require_email_verification: If true, email verification is required for new accounts.
  • default_admin_email and default_admin_password: Credentials for the default admin account.

Completion (LLM)

Refer to the LLMProvider to learn more about how R2R supports LLM providers.


[completion]
provider = "litellm"
concurrent_request_limit = 16

  [completion.generation_config]
  model = "openai/gpt-4o"
  temperature = 0.1
  top_p = 1
  max_tokens_to_sample = 1_024
  stream = false
  add_generation_kwargs = { }
  • provider: LLM provider. Options include “litellm” and “openai”.
  • concurrent_request_limit: Maximum number of concurrent requests allowed.
  • generation_config: Detailed configuration for text generation.
    • model: The specific LLM model to use.
    • temperature: Controls randomness in generation (0.0 to 1.0).
    • top_p: Parameter for nucleus sampling.
    • max_tokens_to_sample: Maximum number of tokens to generate.
    • Other parameters control various aspects of text generation.

Cryptography

Refer to the CryptoProvider to learn more about how R2R supports cryptography.

[crypto]
provider = "bcrypt"
  • provider: Cryptography provider for password hashing. Currently, only “bcrypt” is supported.

Database

Refer to the DatabaseProvider to learn more about how R2R supports databases.

[database]
provider = "postgres"
  • provider: Database provider. Only “postgres” is supported.
  • user: Default username for accessing database.
  • password: Default password for accessing database.
  • host: Default host for accessing database.
  • port: Default port for accessing database.
  • db_name: Default db_name for accessing database.

Embedding

Refer to the EmbeddingProvider to learn more about how R2R supports embeddings.

[embedding]
provider = "litellm"
base_model = "text-embedding-3-small"
base_dimension = 512
batch_size = 128
add_title_as_prefix = false
rerank_model = "None"
concurrent_request_limit = 256
  • provider: Embedding provider. Options include “ollama”, “openai” and “sentence-transformers”.
  • base_model: The specific embedding model to use.
  • base_dimension: Dimension of the embedding vectors.
  • batch_size: Number of items to process in a single batch.
  • add_title_as_prefix: Whether to add the title as a prefix to the embedded text.
  • rerank_model: Model used for reranking, if any.
  • concurrent_request_limit: Maximum number of concurrent embedding requests.

Evaluation

[eval]
provider = "None"
  • provider: Evaluation provider. Set to “None” to disable evaluation functionality.

Ingestion

[ingestion]
excluded_parsers = [ "mp4" ]

[[ingestion.override_parsers]]
document_type = "pdf"
parser = "PDFParser"

[ingestion.text_splitter]
type = "recursive_character"
chunk_size = 512
chunk_overlap = 20
  • excluded_parsers: List of file types to exclude from parsing.
  • override_parsers: Custom parser settings for specific document types.
  • text_splitter: Configuration for splitting ingested text.
    • type: The algorithm used for splitting.
    • chunk_size: Maximum size of each chunk in characters.
    • chunk_overlap: Number of overlapping characters between chunks.

Knowledge Graph

Refer to the KGProvider to learn more about how R2R supports knowledge graphs.

[kg]
provider = "neo4j"
batch_size = 1

[kg.kg_extraction_config]
model = "gpt-4o"
temperature = 0.1
top_p = 1
max_tokens_to_sample = 1_024
stream = false
add_generation_kwargs = { }
  • provider: Specifies the backend used for storing and querying the knowledge graph. Options include “neo4j” and “None”.
  • batch_size: Determines how many text chunks are processed at once for knowledge extraction.
  • kg_extraction_config: Configures the language model used for extracting knowledge from text chunks.

Logging

[logging]
provider = "local"
log_table = "logs"
log_info_table = "log_info"
  • provider: Logging provider. Currently set to “local”.
  • log_table: Name of the table where logs are stored.
  • log_info_table: Name of the table where log information is stored.

Prompt Management

[prompt]
provider = "r2r"
  • provider: Prompt management provider. Currently set to “r2r”.

Advanced Configuration

Environment Variables

For sensitive information like API keys, it’s recommended to use environment variables instead of hardcoding them in the configuration file. R2R will automatically look for environment variables for certain settings.

Custom Providers

R2R supports custom providers for various components. To use a custom provider, you’ll need to implement the appropriate interface and register it with R2R. Refer to the developer documentation for more details on creating custom providers.

Configuration Validation

R2R performs validation on the configuration when it’s loaded. If there are any missing required fields or invalid values, an error will be raised. Always test your configuration in a non-production environment before deploying.

Best Practices

  1. Security: Never commit sensitive information like API keys or passwords to version control. Use environment variables instead.
  2. Modularity: Create separate configuration files for different environments (development, staging, production).
  3. Documentation: Keep your configuration files well-commented, especially when using custom or non-standard settings.
  4. Version Control: Track your configuration files in version control, but use .gitignore to exclude files with sensitive information.
  5. Regular Review: Periodically review and update your configuration to ensure it aligns with your current needs and best practices.

Troubleshooting

If you encounter issues with your configuration:

  1. Check the R2R logs for any error messages related to configuration.
  2. Verify that all required fields are present in your configuration file.
  3. Ensure that the values in your configuration are of the correct type (string, number, boolean, etc.).
  4. If using custom providers or non-standard settings, double-check the documentation or consult with the R2R community.

By following this guide, you should be able to configure R2R to suit your specific needs. Remember that R2R is highly customizable, so don’t hesitate to explore different configuration options to optimize your setup.