How R2R works

On this page

  1. Core Architecture
  2. Document Processing Pipeline
  3. Search and Retrieval System
  4. Response Generation
  5. System Components

Core Architecture

R2R operates as a distributed system with several key components:

API Layer

  • RESTful API for all operations
  • Authentication and access control
  • Request routing and validation

Storage Layer

  • Document storage
  • Vector embeddings
  • User and permission data
  • Knowledge graphs

Processing Pipeline

  • Document parsing
  • Chunking and embedding
  • Relationship extraction
  • Task orchestration

Document Processing Pipeline

When you ingest a document into R2R:

  1. Document Parsing

    • Files are processed based on type (PDF, text, images, etc.)
    • Text is extracted and cleaned
    • Metadata is preserved
  2. Chunking

    • Documents are split into semantic units
    • Chunk size and overlap are configurable
    • Headers and structure are maintained
  3. Embedding Generation

    • Each chunk is converted to a vector embedding
    • Multiple embedding models supported
    • Embeddings are optimized for search
  4. Knowledge Graph Creation

    • Relationships between chunks are identified
    • Entities are extracted and linked
    • Graph structure is built and maintained

Search and Retrieval System

R2R uses a sophisticated search system:

Vector Search

  • High-dimensional vector similarity search
  • Optimized indices for fast retrieval
  • Configurable distance metrics

Hybrid Search

Query → [Vector Search Branch] → Semantic Results
→ [Keyword Search Branch] → Lexical Results
→ [Fusion Layer] → Final Ranked Results

Ranking

  • Reciprocal rank fusion
  • Configurable weights
  • Result deduplication

Response Generation

When generating responses:

  1. Context Building

    • Relevant chunks are retrieved
    • Context is formatted for the LLM
    • Citations are prepared
  2. LLM Integration

    • Context is combined with the query
    • System prompts guide response format
    • Streaming support for real-time responses
  3. Post-processing

    • Response validation
    • Citation linking
    • Format cleaning

System Components

R2R consists of several integrated services:

Core Services

┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ API │ ↔ │ Processor │ ↔ │ Storage │
└─────────────┘ └─────────────┘ └─────────────┘
↕ ↕ ↕
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Auth Server │ ↔ │ Orchestrator│ ↔ │ Search │
└─────────────┘ └─────────────┘ └─────────────┘

Database Layer

  • PostgreSQL for structured data
  • pgvector for vector storage
  • Graph data for relationships

External Integrations

  • LLM providers (OpenAI, Anthropic, etc.)
  • Authentication providers
  • Storage systems

Performance Considerations

R2R optimizes for several key metrics:

Latency

  • Cached embeddings
  • Optimized vector indices
  • Request batching

Scalability

  • Horizontal scaling support
  • Distributed processing
  • Load balancing

Reliability

  • Task queuing
  • Error handling
  • Automatic retries

Resource Management

R2R efficiently manages system resources:

  1. Memory Usage

    • Vector index optimization
    • Chunk size management
    • Cache control
  2. Processing Power

    • Parallel processing
    • Batch operations
    • Priority queuing
  3. Storage

    • Efficient vector storage
    • Document versioning
    • Metadata indexing

For detailed deployment configurations and optimization strategies, refer to our Configuration Guide.