Create Vector Index

Create a new vector similarity search index in over the target table. Allowed tables include ‘vectors’, ‘entity’, ‘document_collections’. Vectors correspond to the chunks of text that are indexed for similarity search, whereas entity and document_collections are created during knowledge graph construction.

This endpoint creates a database index optimized for efficient similarity search over vector embeddings. It supports two main indexing methods:

HNSW (Hierarchical Navigable Small World):
- Best for: High-dimensional vectors requiring fast approximate nearest neighbor search
- Pros: Very fast search, good recall, memory-resident for speed
- Cons: Slower index construction, more memory usage
- Key parameters:
  - m: Number of connections per layer (higher = better recall but more memory)
  - ef_construction: Build-time search width (higher = better recall but slower build)
  - ef: Query-time search width (higher = better recall but slower search)
IVF-Flat (Inverted File with Flat Storage):
- Best for: Balance between build speed, search speed, and recall
- Pros: Faster index construction, less memory usage
- Cons: Slightly slower search than HNSW
- Key parameters:
  - lists: Number of clusters (usually sqrt(n) where n is number of vectors)
  - probe: Number of nearest clusters to search

Supported similarity measures:

cosine_distance: Best for comparing semantic similarity
l2_distance: Best for comparing absolute distances
ip_distance: Best for comparing raw dot products

Notes:

Index creation can be resource-intensive for large datasets
Use run_with_orchestration=True for large indices to prevent timeouts
The ‘concurrently’ option allows other operations while building
Index names must be unique per table

Request

This endpoint expects an object.

configobjectRequired

run_with_orchestrationbooleanOptional

Whether to run index creation as an orchestrated task (recommended for large indices)

Response

Successful Response

resultsobject

1	from r2r import R2RClient
2
3	client = R2RClient()
4	# when using auth, do client.login(...)
5
6	# Create an HNSW index for efficient similarity search
7	result = client.indices.create(
8	config={
9	"table_name": "chunks", # The table containing vector embeddings
10	"index_method": "hnsw", # Hierarchical Navigable Small World graph
11	"index_measure": "cosine_distance", # Similarity measure
12	"index_arguments": {
13	"m": 16, # Number of connections per layer
14	"ef_construction": 64,# Size of dynamic candidate list for construction
15	"ef": 40, # Size of dynamic candidate list for search
16	},
17	"index_name": "my_document_embeddings_idx",
18	"index_column": "embedding",
19	"concurrently": True # Build index without blocking table writes
20	},
21	run_with_orchestration=True # Run as orchestrated task for large indices
22	)
23
24	# Create an IVF-Flat index for balanced performance
25	result = client.indices.create(
26	config={
27	"table_name": "chunks",
28	"index_method": "ivf_flat", # Inverted File with Flat storage
29	"index_measure": "l2_distance",
30	"index_arguments": {
31	"lists": 100, # Number of cluster centroids
32	"probe": 10, # Number of clusters to search
33	},
34	"index_name": "my_ivf_embeddings_idx",
35	"index_column": "embedding",
36	"concurrently": True
37	}
38	)

Headers

Request

Response

Errors

1	{
2	"results": {
3	"message": "message"
4	}
5	}