Pipeline Factory
The E2EPipelineFactory
is a factory class that creates the end-to-end pipeline by assembling various components based on the provided configuration.
Usage
To create an end-to-end pipeline using the E2EPipelineFactory
, you need to provide the necessary configuration (R2RConfig
) and optionally specify custom implementations for the vector database, embeddings provider, language model, text splitter, ingestion pipeline, embedding pipeline, RAG pipeline, evaluation pipeline, and application function.
from r2r.main import E2EPipelineFactory, R2RConfig
app = E2EPipelineFactory.create_pipeline(
config=R2RConfig.load_config(),
vector_db_provider=None,
embedding_provider=None,
llm_provider=None,
adapters=None,
ingestion_pipeline_impl=BasicIngestionPipeline,
embedding_pipeline_impl=BasicEmbeddingPipeline,
rag_pipeline_impl=QnARAGPipeline,
eval_pipeline_impl=BasicEvalPipeline,
app_fn=create_app,
)
By default, the E2EPipelineFactory
uses the configurations specified in the R2RConfig
to create the pipeline components. However, you can override any of the components by providing custom implementations or instances.
The resulting app
represents the end-to-end pipeline application that can be used to process documents, perform embeddings, generate responses using the RAG pipeline, and evaluate the pipeline's performance.
Methods
get_vector_db(database_config: dict[str, Any])
This method retrieves the appropriate vector database based on the database_config
. It supports the following vector database providers:
qdrant
: Returns an instance ofQdrantDB
.pgvector
: Returns an instance ofPGVectorDB
.local
: Returns an instance ofLocalVectorDB
.
get_embedding_provider(embedding_config: dict[str, Any])
This method retrieves the appropriate embeddings provider based on the embedding_config
. It supports the following embeddings providers:
openai
: Returns an instance ofOpenAIEmbeddingProvider
.sentence-transformers
: Returns an instance ofSentenceTransformerEmbeddingProvider
with the specified model.
get_llm(llm_config: dict[str, Any])
This method retrieves the appropriate language model based on the llm_config
. It supports the following language models:
openai
: Returns an instance ofOpenAILLM
.litellm
: Returns an instance ofLiteLLM
.
get_text_splitter(text_splitter_config: dict[str, Any])
This method retrieves the appropriate text splitter based on the text_splitter_config
. It currently only supports the recursive_character
text splitter and returns an instance of RecursiveCharacterTextSplitter
with the specified chunk_size
, chunk_overlap
, and length_function
.
create_pipeline(config: R2RConfig, ...)
This method creates the end-to-end pipeline by assembling various components based on the provided config
and optional parameters. It performs the following steps:
- Sets up logging based on the
logging_database
configuration. - Retrieves the embeddings provider using
get_embedding_provider
or uses the providedembedding_provider
. - Retrieves the vector database using
get_vector_db_provider
or uses the providedvector_db_provider
. - Initializes the vector database collection with the specified
collection_name
andembedding_dimension
. - Retrieves the language model using
get_llm_provider
or uses the providedllm
. - Creates a
LoggingDatabaseConnection
instance for logging purposes. - Creates an instance of the
rag_pipeline_impl
(default:QnARAGPipeline
) with the language model, vector database, embedding model, embeddings provider, and logging connection. - Creates an instance of the
embedding_pipeline_impl
(default:BasicEmbeddingPipeline
) with the embedding model, embeddings provider, vector database, logging connection, text splitter, and embedding batch size. - Creates an instance of the
eval_pipeline_impl
(default:BasicEvalPipeline
) with the evaluation configuration and logging connection. - Creates an instance of the
ingestion_pipeline_impl
(default:BasicIngestionPipeline
) with the specified adapters. - Creates the end-to-end pipeline application using the
app_fn
(default:create_app
) with the ingestion pipeline, embedding pipeline, RAG pipeline, evaluation pipeline, configuration, and logging connection. - Returns the created pipeline application.