Occasionally this SDK documentation falls out of date, cross-check with the automatcially generated API Reference documentation for the latest parameters.

Document Ingestion and Management

Ingest Files

Ingest files or directories into your R2R system:

file_paths = ['path/to/file1.txt', 'path/to/file2.txt']
metadatas = [{'key1': 'value1'}, {'key2': 'value2'}]

ingest_response = client.ingest_files(
    file_paths=file_paths,
    metadatas=metadatas,
    # optionally override chunking settings at runtime
    ingestion_config={
        "provider": "unstructured_local",
        "strategy": "auto",
        "chunking_strategy": "by_title",
        "new_after_n_chars": 256, # soft maximum
        "max_characters": 512, # hard maximum
        "combine_under_n_chars": 64, # hard minimum
        "overlap": 100,
    }
)

Refer to the ingestion configuration section for comprehensive details on available options.

file_paths
list[str]
required

A list of file paths or directory paths to ingest. If a directory path is provided, all files within the directory and its subdirectories will be ingested.

metadatas
Optional[list[dict]]

An optional list of metadata dictionaries corresponding to each file. If provided, the length should match the number of files being ingested.

document_ids
Optional[list[Union[UUID, str]]]

An optional list of document IDs to assign to the ingested files. If provided, the length should match the number of files being ingested.

versions
Optional[list[str]]

An optional list of version strings for the ingested files. If provided, the length should match the number of files being ingested.

ingestion_config
Optional[Union[dict, IngestionConfig]]

The ingestion config override parameter enables developers to customize their R2R chunking strategy at runtime.

Ingest Chunks

Ingest pre-parsed text chunks into your R2R system:

chunks = [
  {
    "text": "Aristotle was a Greek philosopher...",
  },
  ...,
  {
    "text": "He was born in 384 BC in Stagira...",
  }
]

ingest_response = client.ingest_chunks(
  chunks=chunks,
  metadata={"title": "Aristotle", "source": "wikipedia"}
)
chunks
list[dict]
required

A list of chunk dictionaries to ingest. Each dictionary should contain at least a “text” key with the chunk text. An optional “metadata” key can contain a dictionary of metadata for the chunk.

document_id
Optional[UUID]

An optional document ID to assign to the ingested chunks. If not provided, a new document ID will be generated.

metadata
Optional[dict]

An optional metadata dictionary for the document.

Update Files

Update existing documents:

file_paths = ["/path/to/r2r/examples/data/aristotle_v2.txt"]
document_ids = ["9fbe403b-c11c-5aae-8ade-ef22980c3ad1"]
update_response = client.update_files(
  file_paths=file_paths,
  document_ids=document_ids,
  metadatas=[{"x":"y"}] # to overwrite the existing metadata
)

The ingestion configuration can be customized analogously to the ingest files endpoint above.

file_paths
list[str]
required

A list of file paths to update.

document_ids
Optional[list[Union[UUID, str]]]
required

A list of document IDs corresponding to the files being updated. When not provided, an attempt is made to generate the correct document id from the given user id and file path.

metadatas
Optional[list[dict]]

An optional list of metadata dictionaries for the updated files.

ingestion_config
Optional[Union[dict, IngestionConfig]]

The ingestion config override parameter enables developers to customize their R2R chunking strategy at runtime.

Documents Overview

Retrieve high-level document information. Results are restricted to the current user’s files, unless the request is made by a superuser, in which case results from all users are returned:

documents_overview = client.documents_overview()
document_ids
Optional[list[Union[UUID, str]]]

An optional list of document IDs to filter the overview.

Document Chunks

Fetch chunks for a particular document:

document_id = "9fbe403b-c11c-5aae-8ade-ef22980c3ad1"
chunks = client.document_chunks(document_id)
document_id
str
required

The ID of the document to retrieve chunks for.

Delete Documents

Delete a document by its ID:

delete_response = client.delete(
  {
    "document_id":
      {"$eq": "9fbe403b-c11c-5aae-8ade-ef22980c3ad1"}
  }
)
filters
list[dict]
required

A list of logical filters to perform over input documents fields which identifies the unique set of documents to delete (e.g., {"document_id": {"$eq": "9fbe403b-c11c-5aae-8ade-ef22980c3ad1"}}). Logical operations might include variables such as "user_id" or "title" and filters like neq, gte, etc.