Troubleshooting Guide: Vector Storage Problems in R2R

Vector storage is a crucial component in R2R (RAG to Riches) for efficient similarity searches. This guide focuses on troubleshooting common vector storage issues, particularly with Postgres and pgvector.

1. Connection Issues

Symptom: R2R can’t connect to the vector database

  1. Check Postgres Connection:

    $psql -h localhost -U your_username -d your_database

    If this fails, the issue might be with Postgres itself, not specifically vector storage.

  2. Verify Environment Variables: Ensure these are correctly set in your R2R configuration:

    • R2R_POSTGRES_USER
    • R2R_POSTGRES_PASSWORD
    • R2R_POSTGRES_HOST
    • R2R_POSTGRES_PORT
    • R2R_POSTGRES_DBNAME
    • R2R_PROJECT_NAME
  3. Check Docker Network: If using Docker, ensure the R2R and Postgres containers are on the same network:

    $docker network inspect r2r-network

2. pgvector Extension Issues

Symptom: “extension pgvector does not exist” error

  1. Check if pgvector is Installed: Connect to your database and run:

    1SELECT * FROM pg_extension WHERE extname = 'vector';
  2. Install pgvector: If not installed, run:

    1CREATE EXTENSION vector;
  3. Verify Postgres Version: pgvector requires Postgres 11 or later. Check your version:

    1SELECT version();

3. Vector Dimension Mismatch

  1. Check Vector Dimensions: Verify the dimension of vectors you’re trying to insert matches your schema:

    1SELECT * FROM information_schema.columns
    2WHERE table_name = 'your_vector_table' AND data_type = 'vector';
  2. Verify R2R Configuration: Ensure the vector dimension in your R2R configuration matches your database schema.

  3. Recreate Table with Correct Dimensions: If dimensions are mismatched, you may need to recreate the table:

    1DROP TABLE your_vector_table;
    2CREATE TABLE your_vector_table (id bigserial PRIMARY KEY, embedding vector(384));

4. Performance Issues

Symptom: Slow similarity searches

  1. Check Index: Ensure you have an appropriate index:

    1CREATE INDEX ON your_vector_table USING ivfflat (embedding vector_cosine_ops);
  2. Analyze Table: Run ANALYZE to update statistics:

    1ANALYZE your_vector_table;
  3. Monitor Query Performance: Use EXPLAIN ANALYZE to check query execution plans:

    1EXPLAIN ANALYZE SELECT * FROM your_vector_table
    2ORDER BY embedding <=> '[your_vector]' LIMIT 10;
  4. Adjust Work Memory: If dealing with large vectors, increase work_mem:

    1SET work_mem = '1GB';

5. Data Integrity Issues

Symptom: Unexpected search results or missing data

  1. Check Vector Normalization: Ensure vectors are normalized before insertion if using cosine similarity.

  2. Verify Data Insertion: Check if data is being correctly inserted:

    1SELECT COUNT(*) FROM your_vector_table;
  3. Inspect Random Samples: Look at some random entries to ensure data quality:

    1SELECT * FROM your_vector_table ORDER BY RANDOM() LIMIT 10;

6. Disk Space Issues

Symptom: Insertion failures or database unresponsiveness

  1. Check Disk Space:

    $df -h
  2. Monitor Postgres Disk Usage:

    1SELECT pg_size_pretty(pg_database_size('your_database'));
  3. Identify Large Tables:

    1SELECT relname, pg_size_pretty(pg_total_relation_size(relid))
    2FROM pg_catalog.pg_statio_user_tables
    3ORDER BY pg_total_relation_size(relid) DESC;

7. Backup and Recovery

If all else fails, you may need to restore from a backup:

  1. Create a Backup:

    $pg_dump -h localhost -U your_username -d your_database > backup.sql
  2. Restore from Backup:

    $psql -h localhost -U your_username -d your_database < backup.sql

Getting Further Help

If these steps don’t resolve your issue:

  1. Check R2R logs for more detailed error messages.
  2. Consult the pgvector documentation for advanced troubleshooting.
  3. Reach out to the R2R community or support channels with detailed information about your setup and the steps you’ve tried.

Remember to always backup your data before making significant changes to your database or vector storage configuration.