This cookbook will guide you through implementing Hypothetical Document Embeddings (HyDE), an advanced technique that significantly enhances RAG performance. By the end of this guide, you’ll have a powerful, flexible RAG system that pushes the boundaries of what’s possible in information retrieval and generation. You may then proceed to use this as a starting point for your own experimentation.

What is HyDE?

HyDE is an innovative approach that supercharges dense retrieval, especially in zero-shot scenarios. Here’s how it works:

  1. Query Expansion: HyDE uses a Language Model to generate hypothetical answers or documents based on the user’s query.
  2. Enhanced Embedding: These hypothetical documents are embedded, creating a richer semantic search space.
  3. Similarity Search: The embeddings are used to find the most relevant actual documents in your database.
  4. Informed Generation: The retrieved documents and original query are used to generate the final response.

The magic of HyDE lies in its ability to bridge the gap between queries and relevant documents, even when there’s no direct lexical overlap.

MultiSearch Diagram

R2R is designed from the ground up to make implementing advanced techniques like HyDE a breeze. Its modular architecture, based on pipes and pipelines, allows for easy customization and extension. Let’s see how R2R’s components come together to create a HyDE-powered RAG system:

The diagram which follows below illustrates the HyDE flow which fits neatly into the schema of our diagram above:

Implementing HyDE

Step 1: Set Up Your R2R Application

First, let’s configure our R2R application to use the HyDE-enabled pipeline:

from r2r import R2RBuilder, R2RConfig, R2RPipeFactoryWithMultiSearch

config = R2RConfig.from_json()
app = (


The sequence above works by using a custom pipe factory, R2RPipeFactoryWithMultiSearch, located in r2r/prebuilts/ The custom factory overrides the default create_vector_search_pipe of the R2RPipeFactory to create a MultiSearchPipe. The MultiSearchPipe is a pre-built implementation in R2R which facilitates the conversion of an incoming query into multiple downstream queries.

class R2RPipeFactoryWithMultiSearch(R2RPipeFactory):
    def create_vector_search_pipe(self, *args, **kwargs):
        A factory method to create a search pipe.

        Overrides include
            task_prompt_name: str
            multi_query_transform_pipe_override: QueryTransformPipe
            multi_inner_search_pipe_override: SearchPipe
            query_generation_template_override: {'template': str, 'input_types': dict[str, str]}


        # Initialize the new query transform pipe
        query_transform_pipe = kwargs.get(
            "multi_query_transform_pipe_override", None
        ) or QueryTransformPipe(
        # Create search pipe override and pipes
        inner_search_pipe = kwargs.get(
            "multi_inner_search_pipe_override", None
        ) or super().create_vector_search_pipe(*args, **kwargs)

        return MultiSearchPipe(


The specification task_prompt_name="hyde" is fed into the QueryTransformPipe to specify that the incoming query is transformed according to the HyDE prompt. This is a default prompt which comes loaded into the R2R prompt provider with the name hyde and is shown below:

{"name": "hyde", "template": "### Instruction:\n\nGiven the query that follows write a double newline separated list of {num_outputs} single paragraph distinct attempted answers to the given query. \nDO NOT generate any single answer which is likely to require information from multiple distinct documents, \nEACH single answer will be used to carry out a cosine similarity semantic search over distinct indexed documents, such as varied medical documents. \nFOR EXAMPLE if asked `how do the key themes of Great Gatsby compare with 1984`, the two attempted answers would be \n`The key themes of Great Gatsby are ... ANSWER_CONTINUED` and `The key themes of 1984 are ... ANSWER_CONTINUED`, where `ANSWER_CONTINUED` IS TO BE COMPLETED BY YOU in your response. \nHere is the original user query to be transformed into answers:\n\n### Query:\n{message}\n\n### Response:\n", "input_types": {"num_outputs": "int", "message": "str"}}

Step 2: Run Your HyDE-Powered RAG Query

The code above creates a factory creates a pipeline that:

  1. Transforms the query into hypothetical documents
  2. Searches for relevant documents using these hypotheticals
  3. Combines the results for final RAG generation

Now, let’s put our HyDE-enhanced RAG system to work:

from r2r import GenerationConfig

# if running with app.serve() then we would instead call client.rag(...)

def run_hyde_query(question: str):
    result = app.rag(
        # argument in `QueryTransformPipe` to specify generation of new searches
    print(f"HyDE-Enhanced Answer:\n\n{result}")

# Let's ask a complex question
run_hyde_query("What are the ethical implications of using AI in healthcare?")

Step 3: Define a Custom HyDE Prompt

R2R comes with a default HyDE prompt, but you can easily customize it and pass it into your application builder:

    "name": "hyde",
    "template": """
    ### Instruction:
    Given the query that follows, write {num_outputs} distinct, single-paragraph attempted answers.
    Each answer should focus on a specific aspect of the query and be suitable for semantic search.

    ### Query:

    ### Response:
    "input_types": {"num_outputs": "int", "message": "str"}

# Use the `query_generation_template_override` for a custom prompt:
app = R2RBuilder(config).with_pipe_factory(

Running HyDE

R2R ships with all the code necessary to run with HyDE. Here’s how you can run try it out from the command line:

The command below does not make use the R2R server, therefore all relevant environment variables must be locally specified (e.g. Postgres and OpenAI) before executing the command above.

python -m r2r.examples.scripts.run_hyde --query="Who was aristotle?"

This script sets up the R2R application with HyDE, as shown above, and runs your query.

Stylized Output
Yielding transformed output: Aristotle was an ancient Greek philosopher and polymath during the Classical period in Ancient Greece. He was a student of Plato and later became the teacher of Alexander the Great. His writings cover many subjects, including physics, biology, zoology, metaphysics, logic, ethics, aesthetics, poetry, theater, music, rhetoric, psychology, linguistics, economics, politics, and government. Aristotle's works have influenced various fields of knowledge and are considered foundational texts in Western philosophy.
Yielding transformed output: Aristotle was born in 384 BCE in Stagira, a small town on the northern coast of Greece. He joined Plato's Academy in Athens at the age of seventeen and remained there until Plato's death. After leaving the Academy, Aristotle spent some time traveling and studying in Asia Minor and Lesbos, where he conducted research in biology and natural sciences. He later returned to Macedonia to tutor Alexander the Great before eventually founding his own school, the Lyceum, in Athens.
Yielding transformed output: Aristotle's contributions to philosophy and science are vast and varied. He is known for developing the first comprehensive system of Western philosophy, which included a theory of logic, a theory of the natural world, and a theory of ethics. His work in logic, particularly the syllogism, laid the groundwork for deductive reasoning. In natural sciences, he made significant contributions to biology and zoology, classifying living organisms and studying their anatomy and behavior. His ethical theories, particularly the concept of virtue ethics, continue to be influential in contemporary moral philosophy.
Final Result:
    "Aristotle was an Ancient Greek philosopher and polymath [1]. He was born in the city of Stagira in northern Greece during the Classical period [4]. Aristotle was the founder of the Peripatetic school of philosophy in the Lyceum in Athens [1]. His writings covered a broad range of subjects spanning the natural sciences, philosophy, linguistics, economics, politics, psychology, and the arts [1]. Aristotle's influence extended to fields such as logic, biology, political science, zoology, embryology, natural law, scientific method, rhetoric, psychology, realism, criticism, individualism, teleology, and meteorology [7], [22]. He was also known for his contributions to formal logic, zoology, and the scientific method [10], [21]. Aristotle's works were studied by medieval scholars and his influence continued well into the 19th century [23]."    

HyDE Benefits & Considerations


  • Improved Zero-Shot Performance: Excel at answering questions on topics not explicitly in your database.
  • Enhanced Retrieval Quality: Find more semantically relevant documents, even with complex queries.
  • Flexibility: Easily adapt to different domains and tasks by tweaking the HyDE prompt.


  • Computational Cost: Generating hypothetical documents adds processing time. Consider the trade-off for your use case.

As another example of advanced RAG we turn to web search. Combining web search with your RAG pipeline allows for access to real-time information from the internet. This is particularly useful for queries that require up-to-date information or when your local knowledge base might not have sufficient data.

Implementing Web Search in R2R

To use web search in your R2R application, you’ll need to create a WebSearchPipe and integrate it into your pipeline. Here’s how you can do it:

  1. First, create a WebSearchPipe:
from r2r import SerperClient, WebSearchPipe

web_search_pipe = WebSearchPipe(
    serper_client=SerperClient()  # Requires a Serper API key
  1. Then, use the R2RBuilder to create your application with the web search pipe:
from r2r import R2RBuilder, GenerationConfig

app = R2RBuilder().with_search_pipe(web_search_pipe).build()

result = app.rag(
    "What does SciPhi do?",
print(f"Final Result:\n\n{result}")

Running Web Search RAG

You can run a web search-enabled RAG query using the provided script:

python -m r2r.examples.scripts.run_web_rag --query="What does SciPhi do?"

This will produce output similar to:

Final Result:
    'SciPhi is a cloud platform that simplifies the deployment and optimization of production-ready Retrieval-Augmented Generation (RAG) pipelines for developers [1]. It allows users to effortlessly build, deploy, and scale RAG systems, enabling them to focus on AI innovation while SciPhi handles the infrastructure [2].'

Here’s a more concise version of the “HyDE + Web Search” section:

As an example, we show how R2R can be used to combine Hypothetical Document Embeddings (HyDE) with web search for a powerful, adaptive RAG system.

import fire
from r2r import R2RBuilder, R2RPipeFactoryWithMultiSearch, SerperClient, WebSearchPipe
from r2r.base.abstractions.llm import GenerationConfig

def run_rag_pipeline(query="Who was Aristotle?"):
    web_search_pipe = WebSearchPipe(serper_client=SerperClient())
    synthetic_query_generation_template = {
        "name": "synthetic_query_generation_template",
        "template": """
            ### Instruction:
            Given the following query, write a double newline separated list of up to {num_outputs} advanced queries meant to help answer the original query.
            DO NOT generate any single query which is likely to require information from multiple distinct documents.
            EACH single query will be used to carry out a cosine similarity semantic search over distinct indexed documents.
            FOR EXAMPLE, if asked `how do the key themes of Great Gatsby compare with 1984`, the two queries would be
            `What are the key themes of Great Gatsby?` and `What are the key themes of 1984?`.
            Here is the original user query to be transformed into answers:

            ### Query:

            ### Response:
        "input_types": {"num_outputs": "int", "message": "str"},

    synthetic_query_template = {
        "name": "synthetic_query_generation_template",
        "template": """...""",
        "input_types": {"num_outputs": "int", "message": "str"},
    app = (
            # override inputs consumed in building the MultiSearchPipe
    result = app.rag(query, rag_generation_config=GenerationConfig(model="gpt-4o"))
    print(f"Final Result:\n\n{result}")

if __name__ == "__main__":

Now, let’s run the combined HyDE and web search RAG query:

poetry run python -m r2r.examples.scripts.run_web_multi_rag --query="What does the r2r rag engine do?"
Final Result:
    The R2R RAG engine, also known as "RAG to Riches," is an open-source Python framework designed to simplify the development, deployment, and optimization of Retrieval-Augmented Generation (RAG) systems [4], [16]. It provides out-of-the-box support for multimodal RAG, hybrid search, knowledge graph-powered RAG, and application management [2]. Additionally, it features a RESTful API and includes functionalities such as hybrid search and graph/multimodal RAG [6], [11]. The framework aims to bridge the gap between local LLM experimentation and scalable, production-ready RAG [1].

Conclusion and Next Steps

Congratulations! You’ve just implemented an advanced RAG technique that puts you at the forefront of AI-powered information retrieval and generation. With R2R and HyDE, you’re well-equipped to tackle complex queries and deliver high-quality, context-aware responses.

Where to go from here?

  1. Experiment with different HyDE prompts to optimize for your specific domain.
  2. Try combining HyDE with other R2R features like knowledge graph integration for even more powerful RAG systems.
  3. Dive into R2R’s documentation to discover more advanced techniques and customization options. For more information, refer here.