This guide shows how to use R2R to:

  1. Launch the R2R server
  2. Ingest files into your Postgres vector database
  3. Search over ingested files
  4. Create a RAG (Retrieval-Augmented Generation) response
  5. Perform basic user auth
  6. Observe and analyze a deployed RAG engine.

Be sure to complete the installation instructions before continuing with this guide.

Hello R2R

R2R gives developers configurable vector search and RAG right out of the box, as well as direct method calls instead of the client-server architecture seen throughout the docs:

r2r/examples/hello_r2r.py

from r2r import Document, GenerationConfig, R2R

app = R2R() # You may pass a custom configuration to `R2R`

app.ingest_documents(
    [
        Document(
            type="txt",
            data="John is a person that works at Google.",
            metadata={},
        )
    ]
)

# Call RAG directly on an R2R object
rag_results = app.rag(
    "Who is john", GenerationConfig(model="gpt-3.5-turbo", temperature=0.0)
)
print(f"Search Results:\n{rag_results.search_results}")
print(f"Completion:\n{rag_results.completion}")

# RAG Results:
# Search Results:
# AggregateSearchResult(vector_search_results=[VectorSearchResult(id=2d71e689-0a0e-5491-a50b-4ecb9494c832, score=0.6848798582029441, metadata={'text': 'John is a person that works at Google.', 'version': 'v0', 'chunk_order': 0, 'document_id': 'ed76b6ee-dd80-5172-9263-919d493b439a', 'extraction_id': '1ba494d7-cb2f-5f0e-9f64-76c31da11381', 'associatedQuery': 'Who is john'})], kg_search_results=None)
# Completion:
# ChatCompletion(id='chatcmpl-9g0HnjGjyWDLADe7E2EvLWa35cMkB', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='John is a person that works at Google [1].', role='assistant', function_call=None, tool_calls=None))], created=1719797903, model='gpt-3.5-turbo-0125', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=11, prompt_tokens=145, total_tokens=156))

This guide will demonstrate document ingestion, management, search, and advanced RAG functionalities.

Starting the server

R2R methods can be called directly as app.method(...) instead of the client-server architecture shown throughout the docs.

You can start the R2R server using the R2R CLI:

# run `r2r --config-name=local_ollama serve --docker` for local LLMs
r2r serve --docker

Document Ingestion and Management

R2R efficiently handles diverse document types using Postgres with pgvector, combining relational data management with vector search capabilities. This approach enables seamless ingestion, storage, and retrieval of multimodal data, while supporting flexible document management and user permissions. Expand below to dive deeper:

Note, all document management commands are gated at the user level, with the exception of superusers.

R2R offers a powerful data ingestion process that handles various file types including html, pdf, png, mp3, and txt. The ingestion pipeline parses, chunks, embeds, and stores documents efficiently with a fully asynchronous pipeline. To demonstrate this functionality:

r2r ingest-sample-files

This command initiates the ingestion process, producing output similar to:

r2r.base.providers.vector_db_provider - INFO - Initializing VectorDBProvider with config extra_fields={} provider='pgvector' collection_name='demo_vecs'. - 2024-06-19 15:38:13,151
...
{'results': ["File 'aristotle.txt' processed successfully.", ...]}

Key features of the ingestion process:

  1. Unique document_id generation for each file
  2. Metadata association, including user_id for document management
  3. Efficient parsing, chunking, and embedding of diverse file types

R2R’s document ingestion and management system efficiently handles diverse file types, offering customizable parsing, chunking, and embedding processes. The flexible architecture allows for easy integration with existing workflows and supports custom extensions to meet specific project requirements. Moreover, the R2R system provides comprehensive document management, which you can read more about in the cookbook here.

R2R offers powerful search capabilities, including vector search, hybrid search, and knowledge graph-enhanced search. These features allow for more accurate and contextually relevant information retrieval.

To perform a basic vector search using RAG, execute the following command:

r2r search --query="What was Uber's profit in 2020?"

Example Output:

{'results': [
    {
        'id': UUID('37993d2c-b61a-58b4-9a89-f167d59b8633'),
        'score': 0.7662125334175588,
        'metadata': {
            'text': "Uber's profit in 2020 was a net loss of $6,768 million.",
            'title': 'uber_2021.pdf',
            'user_id': '2acb499e-8428-543b-bd85-0d9098718220',
            'version': 'v0',
            'chunk_order': 15,
            'document_id': 'c996e617-88a4-5c65-ab1e-948344b18d27',
            'extraction_id': 'aeba6400-1bd0-5ee9-8925-04732d675434',
            'associatedQuery': "What was Uber's profit in 2020?"
        }
    },
    // ... more results
]}

This search uses vector embeddings to find the most relevant chunks of text from the ingested documents.

Behind the scenes, R2R’s RetrievalService handles these search requests. The search method accepts VectorSearchSettings and KGSearchSettings to customize the search behavior:

async def search(
    self,
    query: str,
    vector_search_settings: VectorSearchSettings = VectorSearchSettings(),
    kg_search_settings: KGSearchSettings = KGSearchSettings(),
):
    # ... implementation details ...

This flexible architecture allows for combining different search strategies to provide the most relevant and comprehensive results for your queries.

Retrieval-Augmented Generation (RAG)

R2R is built around a comprehensive Retrieval-Augmented Generation (RAG) engine, allowing you to generate contextually relevant responses based on your ingested documents. The RAG process combines the search functionality with language model generation to produce more accurate and informative answers.

To generate a response using RAG, use the following command:

r2r rag --query="What was Uber's profit in 2020?"

Example Output:

{'results': [
    ChatCompletion(
        id='chatcmpl-9RCB5xUbDuI1f0vPw3RUO7BWQImBN',
        choices=[
            Choice(
                finish_reason='stop',
                index=0,
                logprobs=None,
                message=ChatCompletionMessage(
                    content="Uber's profit in 2020 was a net loss of $6,768 million [10].",
                    role='assistant',
                    function_call=None,
                    tool_calls=None)
                )
            ],
        created=1716268695,
        model='gpt-3.5-turbo-0125',
        object='chat.completion',
        system_fingerprint=None,
        usage=CompletionUsage(completion_tokens=20, prompt_tokens=1470, total_tokens=1490)
    )
]}

This command performs a search on the ingested documents and uses the retrieved information to generate a response.

Behind the scenes, R2R’s RetrievalService handles RAG requests, combining the power of vector search, optional knowledge graph integration, and language model generation. The flexible architecture allows for easy customization and extension of the RAG pipeline to meet diverse requirements.

User Auth

R2R provides robust user auth and management capabilities. This section briefly covers user authentication features and how they relate to document management.

These authentication features ensure that users can only access and manage their own documents. When performing operations like search, RAG, or document management, the results are automatically filtered based on the authenticated user’s permissions.

Remember to replace YOUR_ACCESS_TOKEN and YOUR_REFRESH_TOKEN with actual tokens obtained during the login process.

Observability and Analytics

R2R provides robust observability and analytics features, allowing superusers to monitor system performance, track usage patterns, and gain insights into the RAG application’s behavior. These advanced features are crucial for maintaining and optimizing your R2R deployment.

Observability and analytics features are restricted to superusers only. By default, R2R is configured to treat unauthenticated users as superusers for quick testing and development. In a production environment, you should disable this setting and properly manage superuser access.

R2R offers high level user observability for superusers

r2r users-overview

This command returns detailed log user information, here’s some example output:

{'user_id': '2acb499e-8428-543b-bd85-0d9098718220', 'num_files': 9, 'total_size_in_bytes': 4027056, 'document_ids': ['93123a68-d668-51de-8291-92162730dc87', 'e0fc8bbc-95be-5a98-891f-c17a43fa2c3d', 'cafdf784-a1dc-5103-8098-5b0a97db1707', 'b21a46a4-2906-5550-9529-087697da2944', 'b736292c-11e6-5453-9686-055da3edb866', 'f17eac52-a22e-5c75-af8f-0b25b82d43f8', '022fdff4-f87d-5b0c-82e4-95d53bcc4e60', 'c5b31b3a-06d2-553e-ac3e-47c56139b484', 'e0c2de57-171d-5385-8081-b546a2c63ce3']}

This summary returns information for each user about their number of files ingested, the total size of user ingested files, and the corresponding document ids.

R2R automatically logs various events and metrics during its operation. You can access these logs using the logs command:

r2r logs

This command returns detailed log entries for various operations, including search and RAG requests. Here’s an example of a log entry:

{
    'run_id': UUID('27f124ad-6f70-4641-89ab-f346dc9d1c2f'),
    'run_type': 'rag',
    'entries': [
        {'key': 'search_results', 'value': '["{\\"id\\":\\"7ed3a01c-88dc-5a58-a68b-6e5d9f292df2\\",...}"]'},
        {'key': 'search_query', 'value': 'Who is aristotle?'},
        {'key': 'rag_generation_latency', 'value': '3.79'},
        {'key': 'llm_response', 'value': 'Aristotle (Greek: Ἀριστοτέλης Aristotélēs; 384–322 BC) was...'}
    ]
}

These logs provide detailed information about each operation, including search results, queries, latencies, and LLM responses.

These observability and analytics features provide valuable insights into your R2R application’s performance and usage, enabling data-driven optimization and decision-making.