Streaming Retrieval API
Using real-time streaming for RAG and agent interactions
R2R provides powerful streaming capabilities for its retrieval services, including RAG responses and agent interactions. These streaming features allow for real-time updates as information is retrieved and processed, enhancing user experience for applications that benefit from immediate feedback.
Streaming Events
When using streaming in R2R, various event types are emitted during the retrieval and generation process:
Streaming RAG
Basic Streaming RAG
To use streaming with basic RAG functionality:
Python
JavaScript
Curl
Streaming RAG with Web Search
To include web search in your streaming RAG:
Python
JavaScript
Streaming Agent
R2R provides a powerful streaming agent mode that supports complex interactions with both document-based knowledge and web resources.
Basic Streaming Agent
Python
JavaScript
Advanced Research Agent with Tools
R2R’s agent mode can leverage multiple tools to perform in-depth research:
Python
JavaScript
Streaming Citations
R2R’s streaming citations provide detailed attribution information that links specific parts of the response to source documents:
Each citation includes:
id
: Unique identifier for the citationindex
: The display index (e.g., [1], [2])start_index
andend_index
: Character positions in the responsesource_type
: The type of source (chunk, graph, web)source_id
: ID of the specific chunk/nodedocument_id
: ID of the parent documentsource_title
: Title of the source document
Implementing Streaming UI
To create a responsive UI with streaming RAG, consider these patterns:
Frontend Implementation
React
Best Practices
Optimizing Streaming RAG
-
Handle Event Types Properly
- Process each event type according to its purpose
- Update UI incrementally as events arrive
- Cache search results to improve perceived performance
-
Error Handling
- Implement robust error handling for stream interruptions
- Provide fallback mechanisms for connection issues
- Consider retry logic for temporary failures
-
UI Considerations
- Display typing indicators during generation
- Highlight citations as they appear
- Show search results separately from generated content
-
Performance
- Monitor stream processing performance
- Optimize rendering for large responses
- Consider batching UI updates for efficiency
Example Implementation
Here’s a complete example of RAG with hybrid search, web integration, and streaming:
Python
Advanced Configuration
Customizing Streaming Behavior
Python
Limitations and Considerations
- Stream connections require stable network connectivity
- Processing streams requires more client-side logic than non-streaming requests
- Citation indices may not be finalized until the entire response is generated
- Some LLM providers may have different streaming behaviors or limitations
For more information on RAG and retrieval capabilities, see the Search and RAG and Retrieval API Reference documentation.