RAG Query

Execute a RAG (Retrieval-Augmented Generation) query.

This endpoint combines search results with language model generation to produce accurate, contextually-relevant responses based on your document corpus.

Features:

Combines vector search, optional knowledge graph integration, and LLM generation
Automatically cites sources with unique citation identifiers
Supports both streaming and non-streaming responses
Compatible with various LLM providers (OpenAI, Anthropic, etc.)
Web search integration for up-to-date information

Search Configuration: All search parameters from the search endpoint apply here, including filters, hybrid search, and graph-enhanced search.

Generation Configuration: Fine-tune the language model’s behavior with rag_generation_config:

1 {
2     "model": "openai/gpt-4o-mini",  // Model to use
3     "temperature": 0.7,              // Control randomness (0-1)
4     "max_tokens": 1500,              // Maximum output length
5     "stream": true                   // Enable token streaming
6 }

Model Support:

OpenAI models (default)
Anthropic Claude models (requires ANTHROPIC_API_KEY)
Local models via Ollama
Any provider supported by LiteLLM

Streaming Responses: When stream: true is set, the endpoint returns Server-Sent Events with the following types:

search_results: Initial search results from your documents
message: Partial tokens as they’re generated
citation: Citation metadata when sources are referenced
final_answer: Complete answer with structured citations

Example Response:

1 {
2 "generated_answer": "DeepSeek-R1 is a model that demonstrates impressive performance...[1]",
3 "search_results": { ... },
4 "citations": [
5     {
6         "id": "cit.123456",
7         "object": "citation",
8         "payload": { ... }
9     }
10 ]
11 }

Request

This endpoint expects an object.

querystringRequired

search_modeenumOptional

Default value of custom allows full control over search settings.

Pre-configured search modes: basic: A simple semantic-based search. advanced: A more powerful hybrid search combining semantic and full-text. custom: Full control via search_settings.

If filters or limit are provided alongside basic or advanced, they will override the default settings for that mode.

Allowed values:

search_settingsobjectOptional

The search configuration object. If search_mode is custom, these settings are used as-is. For basic or advanced, these settings will override the default mode configuration.

Common overrides include filters to narrow results and limit to control how many results are returned.

rag_generation_configobjectOptional

Configuration for RAG generation

task_promptstringOptional

Optional custom prompt to override default

include_title_if_availablebooleanOptionalDefaults to false

Include document titles in responses when available

include_web_searchbooleanOptionalDefaults to false

Include web search results provided to the LLM.

Response

Successful Response

1	from r2r import R2RClient
2
3	client = R2RClient()
4	# when using auth, do client.login(...)
5
6	# Basic RAG request
7	response = client.retrieval.rag(
8	query="What is DeepSeek R1?",
9	)
10
11	# Advanced RAG with custom search settings
12	response = client.retrieval.rag(
13	query="What is DeepSeek R1?",
14	search_settings={
15	"use_semantic_search": True,
16	"filters": {"document_id": {"$eq": "e43864f5-a36f-548e-aacd-6f8d48b30c7f"}},
17	"limit": 10,
18	},
19	rag_generation_config={
20	"stream": False,
21	"temperature": 0.7,
22	"max_tokens": 1500
23	}
24	)
25
26	# Hybrid search in RAG
27	results = client.retrieval.rag(
28	"Who is Jon Snow?",
29	search_settings={"use_hybrid_search": True}
30	)
31
32	# Custom model selection
33	response = client.retrieval.rag(
34	"Who was Aristotle?",
35	rag_generation_config={"model":"anthropic/claude-3-haiku-20240307", "stream": True}
36	)
37	for chunk in response:
38	print(chunk)
39
40	# Streaming RAG
41	from r2r import (
42	CitationEvent,
43	FinalAnswerEvent,
44	MessageEvent,
45	SearchResultsEvent,
46	R2RClient,
47	)
48
49	result_stream = client.retrieval.rag(
50	query="What is DeepSeek R1?",
51	search_settings={"limit": 25},
52	rag_generation_config={"stream": True},
53	)
54
55	# Process different event types
56	for event in result_stream:
57	if isinstance(event, SearchResultsEvent):
58	print("Search results:", event.data)
59	elif isinstance(event, MessageEvent):
60	print("Partial message:", event.data.delta)
61	elif isinstance(event, CitationEvent):
62	print("New citation detected:", event.data.id)
63	elif isinstance(event, FinalAnswerEvent):
64	print("Final answer:", event.data.generated_answer)

Headers

Request

Response

Errors

1	{
2	"key": "value"
3	}