RAG Query
Execute a RAG (Retrieval-Augmented Generation) query.
This endpoint combines search results with language model generation to produce accurate, contextually-relevant responses based on your document corpus.
Features:
- Combines vector search, optional knowledge graph integration, and LLM generation
- Automatically cites sources with unique citation identifiers
- Supports both streaming and non-streaming responses
- Compatible with various LLM providers (OpenAI, Anthropic, etc.)
- Web search integration for up-to-date information
Search Configuration: All search parameters from the search endpoint apply here, including filters, hybrid search, and graph-enhanced search.
Generation Configuration:
Fine-tune the language model’s behavior with rag_generation_config
:
Model Support:
- OpenAI models (default)
- Anthropic Claude models (requires ANTHROPIC_API_KEY)
- Local models via Ollama
- Any provider supported by LiteLLM
Streaming Responses:
When stream: true
is set, the endpoint returns Server-Sent Events with the following types:
search_results
: Initial search results from your documentsmessage
: Partial tokens as they’re generatedcitation
: Citation metadata when sources are referencedfinal_answer
: Complete answer with structured citations
Example Response:
Headers
Bearer authentication of the form Bearer <token>, where token is your auth token.
Request
Default value of custom
allows full control over search settings.
Pre-configured search modes:
basic
: A simple semantic-based search.
advanced
: A more powerful hybrid search combining semantic and full-text.
custom
: Full control via search_settings
.
If filters
or limit
are provided alongside basic
or advanced
, they will override the default settings for that mode.
The search configuration object. If search_mode
is custom
, these settings are used as-is. For basic
or advanced
, these settings will override the default mode configuration.
Common overrides include filters
to narrow results and limit
to control how many results are returned.
Configuration for RAG generation
Optional custom prompt to override default
Include document titles in responses when available
Include web search results provided to the LLM.
Response
Successful Response