Troubleshooting Guide: Vector Storage Problems in R2R

Vector storage is a crucial component in R2R (RAG to Riches) for efficient similarity searches. This guide focuses on troubleshooting common vector storage issues, particularly with Postgres and pgvector.

1. Connection Issues

Symptom: R2R can’t connect to the vector database

  1. Check Postgres Connection:

    psql -h localhost -U your_username -d your_database
    

    If this fails, the issue might be with Postgres itself, not specifically vector storage.

  2. Verify Environment Variables: Ensure these are correctly set in your R2R configuration:

    • POSTGRES_USER
    • POSTGRES_PASSWORD
    • POSTGRES_HOST
    • POSTGRES_PORT
    • POSTGRES_DBNAME
    • POSTGRES_PROJECT_NAME
  3. Check Docker Network: If using Docker, ensure the R2R and Postgres containers are on the same network:

    docker network inspect r2r-network
    

2. pgvector Extension Issues

Symptom: “extension pgvector does not exist” error

  1. Check if pgvector is Installed: Connect to your database and run:

    SELECT * FROM pg_extension WHERE extname = 'vector';
    
  2. Install pgvector: If not installed, run:

    CREATE EXTENSION vector;
    
  3. Verify Postgres Version: pgvector requires Postgres 11 or later. Check your version:

    SELECT version();
    

3. Vector Dimension Mismatch

  1. Check Vector Dimensions: Verify the dimension of vectors you’re trying to insert matches your schema:

    SELECT * FROM information_schema.columns
    WHERE table_name = 'your_vector_table' AND data_type = 'vector';
    
  2. Verify R2R Configuration: Ensure the vector dimension in your R2R configuration matches your database schema.

  3. Recreate Table with Correct Dimensions: If dimensions are mismatched, you may need to recreate the table:

    DROP TABLE your_vector_table;
    CREATE TABLE your_vector_table (id bigserial PRIMARY KEY, embedding vector(384));
    

4. Performance Issues

Symptom: Slow similarity searches

  1. Check Index: Ensure you have an appropriate index:

    CREATE INDEX ON your_vector_table USING ivfflat (embedding vector_cosine_ops);
    
  2. Analyze Table: Run ANALYZE to update statistics:

    ANALYZE your_vector_table;
    
  3. Monitor Query Performance: Use EXPLAIN ANALYZE to check query execution plans:

    EXPLAIN ANALYZE SELECT * FROM your_vector_table
    ORDER BY embedding <=> '[your_vector]' LIMIT 10;
    
  4. Adjust Work Memory: If dealing with large vectors, increase work_mem:

    SET work_mem = '1GB';
    

5. Data Integrity Issues

Symptom: Unexpected search results or missing data

  1. Check Vector Normalization: Ensure vectors are normalized before insertion if using cosine similarity.

  2. Verify Data Insertion: Check if data is being correctly inserted:

    SELECT COUNT(*) FROM your_vector_table;
    
  3. Inspect Random Samples: Look at some random entries to ensure data quality:

    SELECT * FROM your_vector_table ORDER BY RANDOM() LIMIT 10;
    

6. Disk Space Issues

Symptom: Insertion failures or database unresponsiveness

  1. Check Disk Space:

    df -h
    
  2. Monitor Postgres Disk Usage:

    SELECT pg_size_pretty(pg_database_size('your_database'));
    
  3. Identify Large Tables:

    SELECT relname, pg_size_pretty(pg_total_relation_size(relid))
    FROM pg_catalog.pg_statio_user_tables
    ORDER BY pg_total_relation_size(relid) DESC;
    

7. Backup and Recovery

If all else fails, you may need to restore from a backup:

  1. Create a Backup:

    pg_dump -h localhost -U your_username -d your_database > backup.sql
    
  2. Restore from Backup:

    psql -h localhost -U your_username -d your_database < backup.sql
    

Getting Further Help

If these steps don’t resolve your issue:

  1. Check R2R logs for more detailed error messages.
  2. Consult the pgvector documentation for advanced troubleshooting.
  3. Reach out to the R2R community or support channels with detailed information about your setup and the steps you’ve tried.

Remember to always backup your data before making significant changes to your database or vector storage configuration.