Introduction

This guide demonstrates how to leverage R2R’s powerful analytics and logging features. These capabilities allow you to monitor system performance, track usage patterns, and gain valuable insights into your RAG application’s behavior.

Setup

Ensure you have R2R installed and configured as described in the installation guide. For this cookbook, we’ll use the default configuration.

Basic Usage

Logging

R2R automatically logs various events and metrics during its operation. To access these logs:

app = R2R()

# Perform some searches / RAG completions
logs = app.logs()
print(logs)

Expected Output:

[
    {
        'run_id': UUID('27f124ad-6f70-4641-89ab-f346dc9d1c2f'),
        'run_type': 'rag',
        'entries': [
            {'key': 'search_query', 'value': 'Who is Aristotle?'},
            {'key': 'search_latency', 'value': '0.39'},
            {'key': 'search_results', 'value': '["{\\"id\\":\\"7ed3a01c-88dc-5a58-a68b-6e5d9f292df2\\",...}"]'},
            {'key': 'rag_generation_latency', 'value': '3.79'},
            {'key': 'llm_response', 'value': 'Aristotle (Greek: Ἀριστοτέλης Aristotélēs; 384–322 BC) was...'}
        ]
    },
    # More log entries...
]

These logs provide detailed information about each operation, including search results, queries, latencies, and LLM responses.

To execute this by from within the quickstart, execute the following:

python -m r2r.examples.quickstart logs

Analytics

R2R offers an analytics feature that allows you to aggregate and analyze log data:

from r2r import FilterCriteria, AnalysisTypes

filter_criteria = FilterCriteria(filters={"search_latencies": "search_latency"})
analysis_types = AnalysisTypes(analysis_types={"search_latencies": ["basic_statistics", "search_latency"]})

analytics_results = app.analytics(filter_criteria, analysis_types)
print(analytics_results)

Expected Output:

{
    'results': {
        'filtered_logs': {
            'search_latencies': [
                {
                    'timestamp': '2024-06-20 21:29:06',
                    'log_id': UUID('0f28063c-8b87-4934-90dc-4cd84dda5f5c'),
                    'key': 'search_latency',
                    'value': '0.66',
                    'rn': 3
                },
                ...
            ]
        },
        'search_latencies': {
            'Mean': 0.734,
            'Median': 0.523,
            'Mode': 0.495,
            'Standard Deviation': 0.213,
            'Variance': 0.0453
        }
    }
}

The boilerplate analytics implementation allows you to:

  1. Filter logs based on specific criteria
  2. Perform statistical analysis on various metrics (e.g., search latencies)
  3. Track performance trends over time
  4. Identify potential bottlenecks or areas for optimization
python -m r2r.examples.quickstart analytics --filters '{"search_latencies": "search_latency"}' --analysis_types '{"search_latencies": ["basic_statistics", "search_latency"]}'

Experimental Features

Advanced analytics features are still in an experimental state - please reach out to the R2R team if you are interested in using this feature.

Custom Analytics

R2R’s analytics system is flexible and allows for custom analysis. You can specify different filters and analysis types to focus on specific aspects of your application’s performance.

# Analyze RAG latencies
rag_filter = FilterCriteria(filters={"rag_latencies": "rag_generation_latency"})
rag_analysis = AnalysisTypes(analysis_types={"rag_latencies": ["basic_statistics", "rag_generation_latency"]})

rag_analytics = app.analytics(rag_filter, rag_analysis)
print(rag_analytics)

# Track usage patterns by user
user_filter = FilterCriteria(filters={"user_patterns": "user_id"})
user_analysis = AnalysisTypes(analysis_types={"user_patterns": ["bar_chart", "user_id"]})

user_analytics = app.analytics(user_filter, user_analysis)
print(user_analytics)

# Monitor error rates
error_filter = FilterCriteria(filters={"error_rates": "error"})
error_analysis = AnalysisTypes(analysis_types={"error_rates": ["basic_statistics", "error"]})

error_analytics = app.analytics(error_filter, error_analysis)
print(error_analytics)

Preloading Data for Analysis

To get meaningful analytics, you need a substantial amount of data. Here’s a script to preload your database with random searches:

import random
from r2r import R2R, GenerationConfig

app = R2R()

# List of sample queries
queries = [
    "What is artificial intelligence?",
    "Explain machine learning.",
    "How does natural language processing work?",
    "What are neural networks?",
    "Describe deep learning.",
    # Add more queries as needed
]

# Perform random searches
for _ in range(1000):
    query = random.choice(queries)
    app.rag(query, GenerationConfig(model="gpt-3.5-turbo"))

print("Preloading complete. You can now run analytics on this data.")

After running this script, you’ll have a rich dataset to analyze using the analytics features described above.

User-Level Analytics

To get analytics for a specific user:

user_id = "your_user_id_here"

user_filter = FilterCriteria(filters={"user_analytics": "user_id"})
user_analysis = AnalysisTypes(analysis_types={
    "user_analytics": ["basic_statistics", "user_id"],
    "user_search_latencies": ["basic_statistics", "search_latency"]
})

user_analytics = app.analytics(user_filter, user_analysis)
print(f"Analytics for user {user_id}:")
print(user_analytics)

This will give you insights into the behavior and performance of specific users in your system.

Summary

R2R’s logging and analytics features provide powerful tools for understanding and optimizing your RAG application. By leveraging these capabilities, you can:

  • Monitor system performance in real-time
  • Analyze trends in search and RAG operations
  • Identify potential bottlenecks or areas for improvement
  • Track user behavior and usage patterns
  • Make data-driven decisions to enhance your application’s performance and user experience

For detailed setup and basic functionality, refer back to the R2R Quickstart. For more advanced usage and customization options, join the R2R Discord community.