Maintenance & Scaling

Learn how to maintain and scale your R2R system

User management features are currently restricted to:

  • Self-hosted instances
  • Enterprise tier cloud accounts

Contact our sales team for Enterprise pricing and features.

This guide covers essential maintenance tasks for R2R deployments, with a focus on vector index management and system updates. Understanding when and how to build vector indices, as well as keeping your R2R installation current, is crucial for maintaining optimal performance at scale.

Vector Indices

Do You Need Vector Indices?

Vector indices are not necessary for all deployments, especially in multi-user applications where each user typically queries their own subset of documents. Consider that:

  • In multi-user applications, queries are usually filtered by user_id, drastically reducing the actual number of vectors being searched
  • A system with 1 million total vectors but 1000 users might only search through 1000 vectors per query
  • Performance impact of not having indices is minimal when searching small per-user document sets

Only consider implementing vector indices when:

  • Individual users are searching across hundreds of thousands of documents
  • Query latency becomes a bottleneck even with user-specific filtering
  • You need to support cross-user search functionality at scale

For development environments or smaller deployments, the overhead of maintaining vector indices often outweighs their benefits.

Vector Index Management

R2R supports multiple indexing methods, with HNSW (Hierarchical Navigable Small World) being recommended for most use cases:

1# Create vector index
2from r2r import R2RClient
3client = R2RClient()
4
5create_response = client.indices.create(
6 {
7 "table_name": "vectors",
8 "index_method": "hnsw",
9 "index_measure": "cosine_distance",
10 "index_arguments": {
11 "m": 16, # Number of connections per element
12 "ef_construction": 64 # Size of dynamic candidate list
13 },
14 }
15)
16# List existing indices
17indices = client.indices.list()
18
19# Delete an index
20delete_response = client.indices.delete(
21 index_name="ix_vector_cosine_ops_hnsw__20241021211541",
22 table_name="vectors",
23)
24print('delete_response = ', delete_response)

Important Considerations

  1. Pre-warming Requirement

    • New indices start “cold” and require warming for optimal performance
    • Initial queries will be slower until the index is loaded into memory
    • Consider implementing explicit pre-warming in production
    • Warming must be repeated after system restarts
  2. Resource Usage

    • Index creation is CPU and memory intensive
    • Memory usage scales with both dataset size and m parameter
    • Consider creating indices during off-peak hours
  3. Performance Tuning

    • HNSW Parameters:
      • m: 16-64 (higher = better quality, more memory)
      • ef_construction: 64-100 (higher = better quality, longer build time)
    • Distance Measures:
      • cosine_distance: Best for normalized vectors (most common)
      • l2_distance: Better for absolute distances
      • max_inner_product: Optimized for dot product similarity

Scaling Strategies

Horizontal Scaling

For applications serving many users, it is advantageous to scale the number of R2R replicas horizontally. This improves concurrent handling of requests and reliability.

  1. Load Balancing

    • Deploy multiple R2R replicas behind a load balancer
    • Requests are distributed amongst the replicas
    • Particularly effective since most queries are user-specific
  2. Sharding

    • Consider sharding by user_id for large multi-user deployments
    • Each shard handles a subset of users
    • Maintains performance even with millions of total documents

Horizontal Scaling with Docker Swarm

R2R ships with an example compose file to deploy to Swarm, an advanced Docker feature that manages a cluster of Docker daemons.

After cloning the R2R repository, we can initialize Swarm and start our stack:

1# Set the number of R2R replicas to create, defaults to 3 if not set
2export R2R_REPLICAS=3
3
4# Initialize swarm (if not already running)
5docker swarm init
6
7# Create overlay networks
8docker network create --driver overlay r2r_r2r-network
9
10# Source environment file
11set -a
12source /path/to/.env
13set +a
14
15# Deploy stacks
16docker stack deploy -c R2R/py/r2r/compose.swarm.yaml r2r
17
18# Commands to bring down stacks (when needed)
19docker stack rm r2r

Vertical Scaling

For applications requiring large single-user searches:

  1. Cloud Provider Solutions

    • AWS RDS supports up to 1 billion vectors per instance
    • Scale up compute and memory resources as needed
    • Example instance types:
      • db.r6g.16xlarge: Suitable for up to 100M vectors
      • db.r6g.metal: Can handle 1B+ vectors
  2. Memory Optimization

    1# Optimize for large vector collections
    2client.indices.create(
    3 table_name="vectors",
    4 index_method="hnsw",
    5 index_arguments={
    6 "m": 32, # Increased for better performance
    7 "ef_construction": 80 # Balanced for large collections
    8 }
    9)

Multi-User Considerations

  1. Filtering Optimization

    1# Efficient per-user search
    2response = client.retrieval.search(
    3 "query",
    4 search_settings={
    5 "filters": {
    6 "user_id": {"$eq": "current_user_id"}
    7 }
    8 }
    9)
  2. Collection Management

    • Group related documents into collections
    • Enable efficient access control
    • Optimize search scope
  3. Resource Allocation

    • Monitor per-user resource usage
    • Implement usage quotas if needed
    • Consider dedicated instances for power users

Performance Monitoring

Monitor these metrics to inform scaling decisions:

  1. Query Performance

    • Average query latency per user
    • Number of vectors searched per query
    • Cache hit rates
  2. System Resources

    • Memory usage per instance
    • CPU utilization
    • Storage growth rate
  3. User Patterns

    • Number of active users
    • Query patterns and peak usage times
    • Document count per user
Built with