Documents | The most advanced AI retrieval system. Agentic Retrieval-Augmented Generation (RAG) with a RESTful API.

A Document in R2R is the system’s digital representation of any piece of content you ingest, like a PDF report, webpage text, image, or audio file. It acts as the central container for downstream Chunks, Entities, and more.

Key processes associated with a Document include:

Ingestion: Content is accepted from various formats (.pdf, .docx, .txt, .png, .mp3, etc.) via file upload, raw text, or pre-defined chunks.
Chunking: The document’s content is broken down into smaller, searchable Chunks.
Metadata & Collections: Documents are associated with descriptive metadata (e.g., title, source) and organized into Collections for access control.
Enrichment (Optional): The system can extract Entities and Relationships for knowledge graphs or generate embeddings for semantic search.
Status Tracking: Ingestion and enrichment processes are monitored.

Essentially, the Document object is R2R’s foundational unit for turning your raw information into searchable, analyzable knowledge for RAG and agentic workflows.

Core Document Endpoints

This section provides a high-level overview. See the detailed endpoint documentation below for request/response schemas and examples.

Method	Endpoint	Description
`POST`	`/documents`	Ingest new information (file, text, or chunks) as a document.
`GET`	`/documents`	List existing documents with pagination and filtering.
`GET`	`/documents/{id}`	Retrieve details (metadata, status) about a specific document.
`GET`	`/documents/{id}/download`	Download the original source file of a document.
`GET`	`/documents/{id}/chunks`	List the text `Chunks` derived from a document’s content.
`PATCH`	`/documents/{id}/metadata`	Add or update `metadata` for a document.
`PUT`	`/documents/{id}/metadata`	Replace all `metadata` for a document.
`DELETE`	`/documents/{id}`	Delete a specific document and its associated data.
`DELETE`	`/documents/by-filter`	Delete multiple documents matching filter criteria.
`POST`	`/documents/search`	Search across generated document summaries.
`GET`	`/documents/download_zip`	Download multiple original document files as a zip archive.
`POST`	`/documents/export`	Export document metadata to CSV (superuser).
`POST`	`/documents/{id}/extract`	Start knowledge graph entity/relationship extraction for a document.
`GET`	`/documents/{id}/entities`	List `Entities` identified within a document.
`POST`	`/documents/{id}/entities/export`	Export a document’s entities to CSV (superuser).
`GET`	`/documents/{id}/relationships`	List `Relationships` identified within a document.
`POST`	`/documents/{id}/relationships/export`	Export a document’s relationships to CSV (superuser).
`POST`	`/documents/{id}/deduplicate`	Start entity deduplication process for a document’s entities.
`GET`	`/documents/{id}/collections`	List `Collections` that contain a specific document (superuser).