RAG Query
Execute a RAG (Retrieval-Augmented Generation) query.
This endpoint combines search results with language model generation to produce accurate, contextually-relevant responses based on your document corpus.
Features:
- Combines vector search, optional knowledge graph integration, and LLM generation
- Automatically cites sources with unique citation identifiers
- Supports both streaming and non-streaming responses
- Compatible with various LLM providers (OpenAI, Anthropic, etc.)
- Web search integration for up-to-date information
Search Configuration: All search parameters from the search endpoint apply here, including filters, hybrid search, and graph-enhanced search.
Generation Configuration:
Fine-tune the language model’s behavior with rag_generation_config
:
Model Support:
- OpenAI models (default)
- Anthropic Claude models (requires ANTHROPIC_API_KEY)
- Local models via Ollama
- Any provider supported by LiteLLM
Streaming Responses:
When stream: true
is set, the endpoint returns Server-Sent Events with the following types:
search_results
: Initial search results from your documentsmessage
: Partial tokens as they’re generatedcitation
: Citation metadata when sources are referencedfinal_answer
: Complete answer with structured citations
Example Response:
Headers
Bearer authentication of the form Bearer <token>, where token is your auth token.
Request
Default value of custom
allows full control over search settings.
Pre-configured search modes:
basic
: A simple semantic-based search.
advanced
: A more powerful hybrid search combining semantic and full-text.
custom
: Full control via search_settings
.
If filters
or limit
are provided alongside basic
or advanced
, they will override the default settings for that mode.
The search configuration object. If search_mode
is custom
, these settings are used as-is. For basic
or advanced
, these settings will override the default mode configuration.
Common overrides include filters
to narrow results and limit
to control how many results are returned.