Maintenance & Scaling

Learn how to maintain and scale your R2R system

This guide covers essential maintenance tasks for R2R deployments, with a focus on vector index management and system updates. Understanding when and how to build vector indices, as well as keeping your R2R installation current, is crucial for maintaining optimal performance at scale.

Vector Indices

Do You Need Vector Indices?

Vector indices are not necessary for all deployments, especially in multi-user applications where each user typically queries their own subset of documents. Consider that:

  • In multi-user applications, queries are usually filtered by user_id, drastically reducing the actual number of vectors being searched
  • A system with 1 million total vectors but 1000 users might only search through 1000 vectors per query
  • Performance impact of not having indices is minimal when searching small per-user document sets

Only consider implementing vector indices when:

  • Individual users are searching across hundreds of thousands of documents
  • Query latency becomes a bottleneck even with user-specific filtering
  • You need to support cross-user search functionality at scale

For development environments or smaller deployments, the overhead of maintaining vector indices often outweighs their benefits.

Vector Index Management

R2R supports multiple indexing methods, with HNSW (Hierarchical Navigable Small World) being recommended for most use cases:

1# Create vector index
2create_response = client.create_vector_index(
3 table_name="vectors",
4 index_method="hnsw",
5 index_measure="cosine_distance",
6 index_arguments={
7 "m": 16, # Number of connections per element
8 "ef_construction": 64 # Size of dynamic candidate list
9 },
10 concurrently=True
11)
12
13# List existing indices
14indices = client.list_vector_indices(table_name="vectors")
15
16# Delete an index
17delete_response = client.delete_vector_index(
18 index_name="ix_vector_cosine_ops_hnsw__20241021211541",
19 table_name="vectors",
20 concurrently=True
21)

Important Considerations

  1. Pre-warming Requirement

    • New indices start “cold” and require warming for optimal performance
    • Initial queries will be slower until the index is loaded into memory
    • Consider implementing explicit pre-warming in production
    • Warming must be repeated after system restarts
  2. Resource Usage

    • Index creation is CPU and memory intensive
    • Memory usage scales with both dataset size and m parameter
    • Consider creating indices during off-peak hours
  3. Performance Tuning

    • HNSW Parameters:
      • m: 16-64 (higher = better quality, more memory)
      • ef_construction: 64-100 (higher = better quality, longer build time)
    • Distance Measures:
      • cosine_distance: Best for normalized vectors (most common)
      • l2_distance: Better for absolute distances
      • max_inner_product: Optimized for dot product similarity

System Updates and Maintenance

Version Management

Check your current R2R version:

$r2r version

Update Process

  1. Prepare for Update

    $# Check current versions
    >r2r version
    >r2r db current
    >
    ># Generate system report (optional)
    >r2r generate-report
  2. Stop Running Services

    $r2r docker-down
  3. Update R2R

    $r2r update
  4. Update Database

    $r2r db upgrade
  5. Restart Services

    $r2r serve --docker [additional options]

Database Migration Management

R2R uses database migrations to manage schema changes. Always check and update your database schema after updates:

$# Check current migration
>r2r db current
>
># Apply migrations
>r2r db upgrade

Managing Multiple Environments

Use different project names and schemas for different environments:

$# Development
>export R2R_PROJECT_NAME=r2r_dev
>r2r serve --docker --project-name r2r-dev
>
># Staging
>export R2R_PROJECT_NAME=r2r_staging
>r2r serve --docker --project-name r2r-staging
>
># Production
>export R2R_PROJECT_NAME=r2r_prod
>r2r serve --docker --project-name r2r-prod

Troubleshooting

If issues occur:

  1. Generate a system report:

    $r2r generate-report
  2. Check container health:

    $r2r docker-down
    >r2r serve --docker
  3. Review database state:

    $r2r db current
    >r2r db history
  4. Roll back if needed:

    $r2r db downgrade --revision <previous-working-version>

Scaling Strategies

Horizontal Scaling

For applications serving many users:

  1. Load Balancing

    • Deploy multiple R2R instances behind a load balancer
    • Each instance can handle a subset of users
    • Particularly effective since most queries are user-specific
  2. Sharding

    • Consider sharding by user_id for large multi-user deployments
    • Each shard handles a subset of users
    • Maintains performance even with millions of total documents

Vertical Scaling

For applications requiring large single-user searches:

  1. Cloud Provider Solutions

    • AWS RDS supports up to 1 billion vectors per instance
    • Scale up compute and memory resources as needed
    • Example instance types:
      • db.r6g.16xlarge: Suitable for up to 100M vectors
      • db.r6g.metal: Can handle 1B+ vectors
  2. Memory Optimization

    1# Optimize for large vector collections
    2client.create_vector_index(
    3 table_name="vectors",
    4 index_method="hnsw",
    5 index_arguments={
    6 "m": 32, # Increased for better performance
    7 "ef_construction": 80 # Balanced for large collections
    8 }
    9)

Multi-User Considerations

  1. Filtering Optimization

    1# Efficient per-user search
    2response = client.search(
    3 "query",
    4 vector_search_settings={
    5 "search_filters": {
    6 "user_id": {"$eq": "current_user_id"}
    7 }
    8 }
    9)
  2. Collection Management

    • Group related documents into collections
    • Enable efficient access control
    • Optimize search scope
  3. Resource Allocation

    • Monitor per-user resource usage
    • Implement usage quotas if needed
    • Consider dedicated instances for power users

Performance Monitoring

Monitor these metrics to inform scaling decisions:

  1. Query Performance

    • Average query latency per user
    • Number of vectors searched per query
    • Cache hit rates
  2. System Resources

    • Memory usage per instance
    • CPU utilization
    • Storage growth rate
  3. User Patterns

    • Number of active users
    • Query patterns and peak usage times
    • Document count per user

Additional Resources