Eval Pipeline
The Eval Pipeline is responsible for evaluating the quality of generated completions by comparing them against the query and context. It supports different evaluation providers and allows for flexible configuration of evaluation sampling_fraction.
Eval Provider
The EvalProvider
class represents a provider for evaluating completions. It supports different evaluation providers and allows for sampling evaluations based on a specified fraction.
Initialization
The EvalProvider
is initialized with the following parameters:
provider
: The name of the evaluation provider (e.g., "deepeval", "parea").sampling_fraction
(optional): The fraction of evaluations to perform (default is 1.0, meaning all evaluations are performed).
Evaluation
The evaluate
method performs the evaluation of the query, context, and completion using the specified evaluation provider. It takes the following parameters:
query
: The input query.context
: The context used for generating the completion.completion
: The generated completion to be evaluated.
The method checks if a random value is less than the specified sampling fraction. If the condition is met, it calls the actual evaluation method of the provider and returns the result. If the condition is not met, it returns None
(no evaluation performed).
Basic Eval Pipeline
The BasicEvalPipeline
is a simple implementation of the EvalPipeline
abstract base class. It provides a straightforward way to perform evaluations using a specified evaluation provider and configuration.
Initialization
The BasicEvalPipeline
is initialized with the following parameters:
eval_config
: A dictionary containing the evaluation configuration, including the evaluation sampling_fraction and provider.logging_connection
(optional): An instance ofLoggingDatabaseConnection
for logging.
The pipeline checks if the specified evaluation provider is supported (currently supports "deepeval" and "parea"). If the provider is "deepeval", it attempts to import the DeepEvalProvider
and raises an import error if the package is not installed.
Evaluation
The evaluate
method performs the evaluation of the query, context, and completion using the specified evaluation provider. It is decorated with @log_execution_to_db
to log the execution to the database.
Using a Custom Eval Pipeline
To create a custom eval pipeline, you can subclass the EvalPipeline
abstract base class and implement the necessary methods. Here's an example:
from typing import Any, Optional
from r2r.core import EvalPipeline, LoggingDatabaseConnection, log_execution_to_db
class CustomEvalPipeline(EvalPipeline):
def __init__(
self,
eval_config: dict,
logging_connection: Optional[LoggingDatabaseConnection] = None,
*args,
**kwargs,
):
sampling_fraction = eval_config["sampling_fraction"]
super().__init__(sampling_fraction, logging_connection, *args, **kwargs)
# Custom initialization logic
@log_execution_to_db
def evaluate(self, query: str, context: str, completion: str) -> Any:
# Custom evaluation logic
return {"score": 0.8, "metric": "custom_metric"}
In this example, the CustomEvalPipeline
subclasses the EvalPipeline
and implements the following:
__init__
: Performs custom initialization based on theeval_config
.evaluate
: Implements custom evaluation logic and returns the evaluation result.
Passing the Custom Eval Pipeline to the Factory
To use the custom eval pipeline with the E2EPipelineFactory
, you can pass it when creating the pipeline:
from r2r.main import E2EPipelineFactory
from my_custom_pipeline import CustomEvalPipeline
app = E2EPipelineFactory.create_pipeline(
config=config,
eval_pipeline_impl=CustomEvalPipeline,
)
By passing the CustomEvalPipeline
to the create_pipeline
method using the eval_pipeline_impl
parameter, the factory will use the custom pipeline instead of the default BasicEvalPipeline
.
That's it! You now have a custom eval pipeline that demonstrates how to modify the evaluation logic and can be passed to the E2EPipelineFactory
for use in the end-to-end pipeline.