Maintenance & Scaling
Learn how to maintain and scale your R2R system
User management features are currently restricted to:
- Self-hosted instances
- Enterprise tier cloud accounts
Contact our sales team for Enterprise pricing and features.
This guide covers essential maintenance tasks for R2R deployments, with a focus on vector index management and system updates. Understanding when and how to build vector indices, as well as keeping your R2R installation current, is crucial for maintaining optimal performance at scale.
Vector Indices
Do You Need Vector Indices?
Vector indices are not necessary for all deployments, especially in multi-user applications where each user typically queries their own subset of documents. Consider that:
- In multi-user applications, queries are usually filtered by user_id, drastically reducing the actual number of vectors being searched
- A system with 1 million total vectors but 1000 users might only search through 1000 vectors per query
- Performance impact of not having indices is minimal when searching small per-user document sets
Only consider implementing vector indices when:
- Individual users are searching across hundreds of thousands of documents
- Query latency becomes a bottleneck even with user-specific filtering
- You need to support cross-user search functionality at scale
For development environments or smaller deployments, the overhead of maintaining vector indices often outweighs their benefits.
Vector Index Management
R2R supports multiple indexing methods, with HNSW (Hierarchical Navigable Small World) being recommended for most use cases:
Important Considerations
-
Pre-warming Requirement
- New indices start “cold” and require warming for optimal performance
- Initial queries will be slower until the index is loaded into memory
- Consider implementing explicit pre-warming in production
- Warming must be repeated after system restarts
-
Resource Usage
- Index creation is CPU and memory intensive
- Memory usage scales with both dataset size and
m
parameter - Consider creating indices during off-peak hours
-
Performance Tuning
- HNSW Parameters:
m
: 16-64 (higher = better quality, more memory)ef_construction
: 64-100 (higher = better quality, longer build time)
- Distance Measures:
cosine_distance
: Best for normalized vectors (most common)l2_distance
: Better for absolute distancesmax_inner_product
: Optimized for dot product similarity
- HNSW Parameters:
Scaling Strategies
Horizontal Scaling
For applications serving many users, it is advantageous to scale the number of R2R replicas horizontally. This improves concurrent handling of requests and reliability.
-
Load Balancing
- Deploy multiple R2R replicas behind a load balancer
- Requests are distributed amongst the replicas
- Particularly effective since most queries are user-specific
-
Sharding
- Consider sharding by user_id for large multi-user deployments
- Each shard handles a subset of users
- Maintains performance even with millions of total documents
Horizontal Scaling with Docker Swarm
R2R ships with an example compose file to deploy to Swarm, an advanced Docker feature that manages a cluster of Docker daemons.
After cloning the R2R repository, we can initialize Swarm and start our stack:
Vertical Scaling
For applications requiring large single-user searches:
-
Cloud Provider Solutions
- AWS RDS supports up to 1 billion vectors per instance
- Scale up compute and memory resources as needed
- Example instance types:
db.r6g.16xlarge
: Suitable for up to 100M vectorsdb.r6g.metal
: Can handle 1B+ vectors
-
Memory Optimization
Multi-User Considerations
-
Filtering Optimization
-
Collection Management
- Group related documents into collections
- Enable efficient access control
- Optimize search scope
-
Resource Allocation
- Monitor per-user resource usage
- Implement usage quotas if needed
- Consider dedicated instances for power users
Performance Monitoring
Monitor these metrics to inform scaling decisions:
-
Query Performance
- Average query latency per user
- Number of vectors searched per query
- Cache hit rates
-
System Resources
- Memory usage per instance
- CPU utilization
- Storage growth rate
-
User Patterns
- Number of active users
- Query patterns and peak usage times
- Document count per user