Documents
A Document
in R2R is the system’s digital representation of any piece of content you ingest, like a PDF report, webpage text, image, or audio file. It acts as the central container for downstream Chunks
, Entities
, and more.
Key processes associated with a Document
include:
- Ingestion: Content is accepted from various formats (
.pdf
,.docx
,.txt
,.png
,.mp3
, etc.) via file upload, raw text, or pre-defined chunks. - Chunking: The document’s content is broken down into smaller, searchable
Chunks
. - Metadata & Collections: Documents are associated with descriptive
metadata
(e.g., title, source) and organized intoCollections
for access control. - Enrichment (Optional): The system can extract
Entities
andRelationships
for knowledge graphs or generate embeddings for semantic search. - Status Tracking: Ingestion and enrichment processes are monitored.
Essentially, the Document
object is R2R’s foundational unit for turning your raw information into searchable, analyzable knowledge for RAG and agentic workflows.
Core Document Endpoints
This section provides a high-level overview. See the detailed endpoint documentation below for request/response schemas and examples.