Documents

A Document in R2R is the system’s digital representation of any piece of content you ingest, like a PDF report, webpage text, image, or audio file. It acts as the central container for downstream Chunks, Entities, and more.

Key processes associated with a Document include:

  • Ingestion: Content is accepted from various formats (.pdf, .docx, .txt, .png, .mp3, etc.) via file upload, raw text, or pre-defined chunks.
  • Chunking: The document’s content is broken down into smaller, searchable Chunks.
  • Metadata & Collections: Documents are associated with descriptive metadata (e.g., title, source) and organized into Collections for access control.
  • Enrichment (Optional): The system can extract Entities and Relationships for knowledge graphs or generate embeddings for semantic search.
  • Status Tracking: Ingestion and enrichment processes are monitored.

Essentially, the Document object is R2R’s foundational unit for turning your raw information into searchable, analyzable knowledge for RAG and agentic workflows.

Core Document Endpoints

This section provides a high-level overview. See the detailed endpoint documentation below for request/response schemas and examples.

MethodEndpointDescription
POST/documentsIngest new information (file, text, or chunks) as a document.
GET/documentsList existing documents with pagination and filtering.
GET/documents/{id}Retrieve details (metadata, status) about a specific document.
GET/documents/{id}/downloadDownload the original source file of a document.
GET/documents/{id}/chunksList the text Chunks derived from a document’s content.
PATCH/documents/{id}/metadataAdd or update metadata for a document.
PUT/documents/{id}/metadataReplace all metadata for a document.
DELETE/documents/{id}Delete a specific document and its associated data.
DELETE/documents/by-filterDelete multiple documents matching filter criteria.
POST/documents/searchSearch across generated document summaries.
GET/documents/download_zipDownload multiple original document files as a zip archive.
POST/documents/exportExport document metadata to CSV (superuser).
POST/documents/{id}/extractStart knowledge graph entity/relationship extraction for a document.
GET/documents/{id}/entitiesList Entities identified within a document.
POST/documents/{id}/entities/exportExport a document’s entities to CSV (superuser).
GET/documents/{id}/relationshipsList Relationships identified within a document.
POST/documents/{id}/relationships/exportExport a document’s relationships to CSV (superuser).
POST/documents/{id}/deduplicateStart entity deduplication process for a document’s entities.
GET/documents/{id}/collectionsList Collections that contain a specific document (superuser).