This guide shows how to use R2R to:

  1. Ingest files into R2R
  2. Search over ingested files
  3. Use your data as input to RAG (Retrieval-Augmented Generation)
  4. Perform basic user auth
  5. Observe and analyze an R2R deployment

Be sure to complete the installation instructions before continuing with this guide.

Introduction

R2R is an engine for building user-facing Retrieval-Augmented Generation (RAG) applications. At its core, R2R provides this service through an architecture of providers, services, and an integrated RESTful API. This cookbook provides a detailed walkthrough of how to interact with R2R. Refer here for a deeper dive on the R2R system architecture.

R2R Application Lifecycle

The following diagram illustrates how R2R assembles a user-facing application:

Hello R2R

R2R gives developers configurable vector search and RAG right out of the box, as well as direct method calls instead of the client-server architecture seen throughout the docs:

core/examples/hello_r2r.py
from r2r import R2RClient

client = R2RClient("http://localhost:7272")

with open("test.txt", "w") as file:
    file.write("John is a person that works at Google.")

client.ingest_files(file_paths=["test.txt"])

# Call RAG directly
rag_response = client.rag(
    query="Who is john",
    rag_generation_config={"model": "gpt-4o-mini", "temperature": 0.0},
)
results = rag_response["results"]
print(f"Search Results:\n{results['search_results']}")
print(f"Completion:\n{results['completion']}")

Configuring R2R

R2R is highly configurable. To customize your R2R deployment:

  1. Create a local configuration file named r2r.toml.
  2. In this file, override default settings as needed.

For example:

r2r.toml
[completion]
provider = "litellm"
concurrent_request_limit = 16

  [completion.generation_config]
  model = "openai/gpt-4o"
  temperature = 0.5

[chunking]
provider = "unstructured_local"
strategy = "auto"
chunking_strategy = "by_title"
new_after_n_chars = 512
max_characters = 1_024
combine_under_n_chars = 128
overlap = 20

Then, use the config-path argument to specify your custom configuration when launching R2R:

r2r serve --docker --config-path=r2r.toml

You can read more about configuration here.

Document Ingestion and Management

R2R efficiently handles diverse document types using Postgres with pgvector, combining relational data management with vector search capabilities. This approach enables seamless ingestion, storage, and retrieval of multimodal data, while supporting flexible document management and user permissions.

Key features include:

  • Unique document_id generation for each ingested file
  • User and group permissioning through user_id and group_ids
  • Document versioning for tracking changes over time
  • Granular access to document content through chunk retrieval
  • Flexible deletion and update mechanisms
Note, all document management commands are gated at the user level, with the exception of superusers.

R2R offers a powerful data ingestion process that handles various file types including html, pdf, png, mp3, and txt. The ingestion process parses, chunks, embeds, and stores documents efficiently with a fully asynchronous pipeline. To demonstrate this functionality:

r2r ingest-sample-files

This command initiates the ingestion process, producing output similar to:

[{'message': 'Ingestion task queued successfully.', 'task_id': '6e27dfca-606d-422d-b73f-2d9e138661b4', 'document_id': '28a7266e-6cee-5dd2-b7fa-e4fc8f2b49c6'}, {'message': 'Ingestion task queued successfully.', 'task_id': 'd37deef1-af08-4576-bd79-6d2a7fb6ec33', 'document_id': '2c91b66f-e960-5ff5-a482-6dd0a523d6a1'}, {'message': 'Ingestion task queued successfully.', 'task_id': '4c1240f0-0692-4b67-8d2b-1428f71ea9bc', 'document_id': '638f0ed6-e0dc-5f86-9282-1f7f5243d9fa'}, {'message': 'Ingestion task queued successfully.', 'task_id': '369abcea-79a2-480c-9ade-bbc89f5c500e', 'document_id': 'f25fd516-5cac-5c09-b120-0fc841270c7e'}, {'message': 'Ingestion task queued successfully.', 'task_id': '7c99c168-97ee-4253-8a6f-694437f3e5cb', 'document_id': '77f67c65-6406-5076-8176-3844f3ef3688'}, {'message': 'Ingestion task queued successfully.', 'task_id': '9a6f94b0-8fbc-4507-9435-53e0973aaad0', 'document_id': '9fbe403b-c11c-5aae-8ade-ef22980c3ad1'}, {'message': 'Ingestion task queued successfully.', 'task_id': '61d0e2e0-45ec-43db-9837-ff4da5166ee9', 'document_id': '0032a7a7-cb2a-5d08-bfc1-93d3b760deb4'}, {'message': 'Ingestion task queued successfully.', 'task_id': '1479390e-c295-47b0-a570-370b05b86c8b', 'document_id': 'f55616fb-7d48-53d5-89c2-15d7b8e3834c'}, {'message': 'Ingestion task queued successfully.', 'task_id': '92f73a07-2286-4c42-ac02-d3eba0f252e0', 'document_id': '916b0ed7-8440-566f-98cf-ed7c0f5dba9b'}]

Key features of the ingestion process:

  1. Unique document_id generation for each file
  2. Metadata association, including user_id and group_ids for document management
  3. Efficient parsing, chunking, and embedding of diverse file types

For more advanced document management techniques and user authentication details, refer to the user auth cookbook.

Certainly! I’ll rewrite the AI Powered Search section without using dropdowns, presenting it as a continuous, detailed explanation of R2R’s search capabilities. Here’s the revised version:

R2R offers powerful and highly configurable search capabilities, including vector search, hybrid search, and knowledge graph-enhanced search. These features allow for more accurate and contextually relevant information retrieval.

Vector search inside of R2R is highly configurable, allowing you to fine-tune your search parameters for optimal results. Here’s how to perform a basic vector search:

r2r search --query="What was Uber's profit in 2020?"

Key configurable parameters for vector search include:

  • use_vector_search: Enable or disable vector search.
  • index_measure: Choose between “cosine_distance”, “l2_distance”, or “max_inner_product”.
  • search_limit: Set the maximum number of results to return.
  • include_values: Include search score values in the results.
  • include_metadatas: Include element metadata in the results.
  • probes: Number of ivfflat index lists to query (higher increases accuracy but decreases speed).
  • ef_search: Size of the dynamic candidate list for HNSW index search (higher increases accuracy but decreases speed).

R2R supports hybrid search, which combines traditional keyword-based search with vector search for improved results. Here’s how to perform a hybrid search:

r2r search --query="What was Uber's profit in 2020?" --use-hybrid-search

R2R integrates knowledge graph capabilities to enhance search results with structured relationships. Knowledge graph search can be configured to focus on specific entity types, relationships, or search levels. Here’s how to utilize knowledge graph search:

Knowledge Graphs are not constructed by default, refer to the cookbook here before attempting to run the command below!

r2r search --query="Who founded Airbnb?" --use-kg-search --kg-search-type=local

Key configurable parameters for knowledge graph search include:

  • use_kg_search: Enable knowledge graph search.
  • kg_search_type: Choose between “global” or “local” search.
  • kg_search_level: Specify the level of community to search.
  • entity_types: List of entity types to include in the search.
  • relationships: List of relationship types to include in the search.
  • max_community_description_length: Maximum length of community descriptions.
  • max_llm_queries_for_global_search: Limit on the number of LLM queries for global search.
  • local_search_limits: Set limits for different types of local searches.

Knowledge graph search provides structured information about entities and their relationships, complementing the text-based search results and offering a more comprehensive understanding of the data.

R2R’s search functionality is highly flexible and can be tailored to specific use cases. By adjusting these parameters, you can optimize the search process for accuracy, speed, or a balance between the two, depending on your application’s needs. The combination of vector search, hybrid search, and knowledge graph capabilities allows for powerful and context-aware information retrieval, enhancing the overall performance of your RAG applications.

Retrieval-Augmented Generation (RAG)

R2R is built around a comprehensive Retrieval-Augmented Generation (RAG) engine, allowing you to generate contextually relevant responses based on your ingested documents. The RAG process combines all the search functionality shown above with Large Language Models to produce more accurate and informative answers.

To generate a response using RAG, use the following command:

r2r rag --query="What was Uber's profit in 2020?"

Example Output:

{'results': [
    ChatCompletion(
        id='chatcmpl-9RCB5xUbDuI1f0vPw3RUO7BWQImBN',
        choices=[
            Choice(
                finish_reason='stop',
                index=0,
                logprobs=None,
                message=ChatCompletionMessage(
                    content="Uber's profit in 2020 was a net loss of $6,768 million [10].",
                    role='assistant',
                    function_call=None,
                    tool_calls=None)
                )
            ],
        created=1716268695,
        model='gpt-4o-mini',
        object='chat.completion',
        system_fingerprint=None,
        usage=CompletionUsage(completion_tokens=20, prompt_tokens=1470, total_tokens=1490)
    )
]}

This command performs a search on the ingested documents and uses the retrieved information to generate a response.

Behind the scenes, R2R’s RetrievalService handles RAG requests, combining the power of vector search, optional knowledge graph integration, and language model generation. The flexible architecture allows for easy customization and extension of the RAG pipeline to meet diverse requirements.

User Auth

R2R provides robust user auth and management capabilities. This section briefly covers user authentication features and how they relate to document management.

These authentication features ensure that users can only access and manage their own documents. When performing operations like search, RAG, or document management, the results are automatically filtered based on the authenticated user’s permissions.

Remember to replace YOUR_ACCESS_TOKEN and YOUR_REFRESH_TOKEN with actual tokens obtained during the login process.

Observability and Analytics

R2R provides robust observability and analytics features, allowing superusers to monitor system performance, track usage patterns, and gain insights into the RAG application’s behavior. These advanced features are crucial for maintaining and optimizing your R2R deployment.

Observability and analytics features are restricted to superusers only. By default, R2R is configured to treat unauthenticated users as superusers for quick testing and development. In a production environment, you should disable this setting and properly manage superuser access.

R2R offers high level user observability for superusers

r2r users-overview

This command returns detailed log user information, here’s some example output:

{'results': [{'user_id': '2acb499e-8428-543b-bd85-0d9098718220', 'num_files': 9, 'total_size_in_bytes': 4027056, 'document_ids': ['9fbe403b-c11c-5aae-8ade-ef22980c3ad1', 'e0fc8bbc-95be-5a98-891f-c17a43fa2c3d', 'cafdf784-a1dc-5103-8098-5b0a97db1707', 'b21a46a4-2906-5550-9529-087697da2944', '9fbe403b-c11c-5aae-8ade-ef22980c3ad1', 'f17eac52-a22e-5c75-af8f-0b25b82d43f8', '022fdff4-f87d-5b0c-82e4-95d53bcc4e60', 'c5b31b3a-06d2-553e-ac3e-47c56139b484', 'e0c2de57-171d-5385-8081-b546a2c63ce3']}, ...]}}

This summary returns information for each user about their number of files ingested, the total size of user ingested files, and the corresponding document ids.

R2R automatically logs various events and metrics during its operation. You can access these logs using the logs command:

r2r logs

This command returns detailed log entries for various operations, including search and RAG requests. Here’s an example of a log entry:

{
    'run_id': UUID('27f124ad-6f70-4641-89ab-f346dc9d1c2f'),
    'run_type': 'rag',
    'entries': [
        {'key': 'search_results', 'value': '["{\\"id\\":\\"7ed3a01c-88dc-5a58-a68b-6e5d9f292df2\\",...}"]'},
        {'key': 'search_query', 'value': 'Who is aristotle?'},
        {'key': 'rag_generation_latency', 'value': '3.79'},
        {'key': 'llm_response', 'value': 'Aristotle (Greek: Ἀριστοτέλης Aristotélēs; 384–322 BC) was...'}
    ]
}

These logs provide detailed information about each operation, including search results, queries, latencies, and LLM responses.

These observability and analytics features provide valuable insights into your R2R application’s performance and usage, enabling data-driven optimization and decision-making.