Database
Learn how to configure and use the database provider in R2R
Introduction
R2R’s DatabaseProvider
offers a unified interface for both relational and vector database operations. R2R only provides database support through Postgres with the pgvector extension.
Postgres was selected to power R2R because it is free, open source, and widely considered be a stable state of the art database. Further, the Postgres community has implemented pgvector for efficient storage and retrieval of vector embeddings alongside traditional relational data.
Architecture Overview
The Database Provider in R2R is built on two primary components:
- Vector Database Provider: Handles vector-based operations such as similarity search and hybrid search.
- Relational Database Provider: Manages traditional relational database operations, including user management and document metadata storage.
These providers work in tandem to ensure efficient data management and retrieval.
Configuration
Update the database
section in your r2r.toml
file:
Alternatively, you can set these values using environment variables:
Environment variables take precedence over the config settings in case of conflicts. The R2R Docker includes configuration options that facilitate integration with a combined Postgres+pgvector database setup.
Vector Database Operations
Initialization
The vector database is automatically initialized with dimensions that correspond to your selected embedding model when the Database Provider is first created.
Upsert Vector Entries
Search
Hybrid Search
Delete by Metadata
Relational Database Operations
User Management
Create User
Get User by Email
Update User
Document Management
Upsert Document Overview
Get Documents Overview
Advanced Features
Hybrid Search Function
The Database Provider includes a custom Postgres function for hybrid search, combining full-text and vector similarity search.
Token Blacklisting
For enhanced security, the provider supports token blacklisting:
Security Best Practices
- Environment Variables: Use environment variables for sensitive information like database credentials.
- Connection Pooling: The provider uses connection pooling for efficient database connections.
- Prepared Statements: SQL queries use prepared statements to prevent SQL injection.
- Password Hashing: User passwords are hashed before storage using the configured Crypto Provider.
Performance Considerations
- Indexing: The vector database automatically creates appropriate indexes for efficient similarity search.
- Batch Operations: Use batch operations like
upsert_entries
for better performance when dealing with multiple records. - Connection Reuse: The provider reuses database connections to minimize overhead.
Troubleshooting
Common issues and solutions:
- Connection Errors: Ensure your database credentials and connection details are correct.
- Dimension Mismatch: Verify that the vector dimension in your configuration matches your embeddings.
- Performance Issues: Consider optimizing your queries or adding appropriate indexes.
Conclusion
R2R’s Database Provider offers a powerful and flexible solution for managing both vector and relational data. By leveraging Postgres with pgvector, it provides efficient storage and retrieval of embeddings alongside traditional relational data, enabling advanced search capabilities and robust user management.