Deep Dive
Factory

Pipeline Factory

The E2EPipelineFactory is a factory class that creates the end-to-end pipeline by assembling various components based on the provided configuration.

Usage

To create an end-to-end pipeline using the E2EPipelineFactory, you need to provide the necessary configuration (R2RConfig) and optionally specify custom implementations for the vector database, embeddings provider, language model, text splitter, ingestion pipeline, embedding pipeline, RAG pipeline, evaluation pipeline, and application function.

from r2r.main import E2EPipelineFactory, R2RConfig
 
app = E2EPipelineFactory.create_pipeline(
    config=R2RConfig.load_config(),
    vector_db_provider=None,
    embedding_provider=None,
    llm_provider=None,
    adapters=None,
    ingestion_pipeline_impl=BasicIngestionPipeline,
    embedding_pipeline_impl=BasicEmbeddingPipeline,
    rag_pipeline_impl=QnARAGPipeline,
    eval_pipeline_impl=BasicEvalPipeline,
    app_fn=create_app,
)

By default, the E2EPipelineFactory uses the configurations specified in the R2RConfig to create the pipeline components. However, you can override any of the components by providing custom implementations or instances.

The resulting app represents the end-to-end pipeline application that can be used to process documents, perform embeddings, generate responses using the RAG pipeline, and evaluate the pipeline's performance.

Methods

get_vector_db(database_config: dict[str, Any])

This method retrieves the appropriate vector database based on the database_config. It supports the following vector database providers:

  • qdrant: Returns an instance of QdrantDB.
  • pgvector: Returns an instance of PGVectorDB.
  • local: Returns an instance of LocalVectorDB.

get_embedding_provider(embedding_config: dict[str, Any])

This method retrieves the appropriate embeddings provider based on the embedding_config. It supports the following embeddings providers:

  • openai: Returns an instance of OpenAIEmbeddingProvider.
  • sentence-transformers: Returns an instance of SentenceTransformerEmbeddingProvider with the specified model.

get_llm(llm_config: dict[str, Any])

This method retrieves the appropriate language model based on the llm_config. It supports the following language models:

  • openai: Returns an instance of OpenAILLM.
  • litellm: Returns an instance of LiteLLM.

get_text_splitter(text_splitter_config: dict[str, Any])

This method retrieves the appropriate text splitter based on the text_splitter_config. It currently only supports the recursive_character text splitter and returns an instance of RecursiveCharacterTextSplitter with the specified chunk_size, chunk_overlap, and length_function.

create_pipeline(config: R2RConfig, ...)

This method creates the end-to-end pipeline by assembling various components based on the provided config and optional parameters. It performs the following steps:

  1. Sets up logging based on the logging_database configuration.
  2. Retrieves the embeddings provider using get_embedding_provider or uses the provided embedding_provider.
  3. Retrieves the vector database using get_vector_db_provider or uses the provided vector_db_provider.
  4. Initializes the vector database collection with the specified collection_name and embedding_dimension.
  5. Retrieves the language model using get_llm_provider or uses the provided llm.
  6. Creates a LoggingDatabaseConnection instance for logging purposes.
  7. Creates an instance of the rag_pipeline_impl (default: QnARAGPipeline) with the language model, vector database, embedding model, embeddings provider, and logging connection.
  8. Creates an instance of the embedding_pipeline_impl (default: BasicEmbeddingPipeline) with the embedding model, embeddings provider, vector database, logging connection, text splitter, and embedding batch size.
  9. Creates an instance of the eval_pipeline_impl (default: BasicEvalPipeline) with the evaluation configuration and logging connection.
  10. Creates an instance of the ingestion_pipeline_impl (default: BasicIngestionPipeline) with the specified adapters.
  11. Creates the end-to-end pipeline application using the app_fn (default: create_app) with the ingestion pipeline, embedding pipeline, RAG pipeline, evaluation pipeline, configuration, and logging connection.
  12. Returns the created pipeline application.