Deep Dive
Evaluation Pipeline

Eval Pipeline

The Eval Pipeline is responsible for evaluating the quality of generated completions by comparing them against the query and context. It supports different evaluation providers and allows for flexible configuration of evaluation sampling_fraction.

Eval Provider

The EvalProvider class represents a provider for evaluating completions. It supports different evaluation providers and allows for sampling evaluations based on a specified fraction.

Initialization

The EvalProvider is initialized with the following parameters:

  • provider: The name of the evaluation provider (e.g., "deepeval", "parea").
  • sampling_fraction (optional): The fraction of evaluations to perform (default is 1.0, meaning all evaluations are performed).

Evaluation

The evaluate method performs the evaluation of the query, context, and completion using the specified evaluation provider. It takes the following parameters:

  • query: The input query.
  • context: The context used for generating the completion.
  • completion: The generated completion to be evaluated.

The method checks if a random value is less than the specified sampling fraction. If the condition is met, it calls the actual evaluation method of the provider and returns the result. If the condition is not met, it returns None (no evaluation performed).

Basic Eval Pipeline

The BasicEvalPipeline is a simple implementation of the EvalPipeline abstract base class. It provides a straightforward way to perform evaluations using a specified evaluation provider and configuration.

Initialization

The BasicEvalPipeline is initialized with the following parameters:

  • eval_config: A dictionary containing the evaluation configuration, including the evaluation sampling_fraction and provider.
  • logging_connection (optional): An instance of LoggingDatabaseConnection for logging.

The pipeline checks if the specified evaluation provider is supported (currently supports "deepeval" and "parea"). If the provider is "deepeval", it attempts to import the DeepEvalProvider and raises an import error if the package is not installed.

Evaluation

The evaluate method performs the evaluation of the query, context, and completion using the specified evaluation provider. It is decorated with @log_execution_to_db to log the execution to the database.

Using a Custom Eval Pipeline

To create a custom eval pipeline, you can subclass the EvalPipeline abstract base class and implement the necessary methods. Here's an example:

from typing import Any, Optional
from r2r.core import EvalPipeline, LoggingDatabaseConnection, log_execution_to_db
 
class CustomEvalPipeline(EvalPipeline):
    def __init__(
        self,
        eval_config: dict,
        logging_connection: Optional[LoggingDatabaseConnection] = None,
        *args,
        **kwargs,
    ):
        sampling_fraction = eval_config["sampling_fraction"]
        super().__init__(sampling_fraction, logging_connection, *args, **kwargs)
        # Custom initialization logic
 
    @log_execution_to_db
    def evaluate(self, query: str, context: str, completion: str) -> Any:
        # Custom evaluation logic
        return {"score": 0.8, "metric": "custom_metric"}

In this example, the CustomEvalPipeline subclasses the EvalPipeline and implements the following:

  • __init__: Performs custom initialization based on the eval_config.
  • evaluate: Implements custom evaluation logic and returns the evaluation result.

Passing the Custom Eval Pipeline to the Factory

To use the custom eval pipeline with the E2EPipelineFactory, you can pass it when creating the pipeline:

from r2r.main import E2EPipelineFactory
from my_custom_pipeline import CustomEvalPipeline
 
app = E2EPipelineFactory.create_pipeline(
    config=config,
    eval_pipeline_impl=CustomEvalPipeline,
)

By passing the CustomEvalPipeline to the create_pipeline method using the eval_pipeline_impl parameter, the factory will use the custom pipeline instead of the default BasicEvalPipeline.

That's it! You now have a custom eval pipeline that demonstrates how to modify the evaluation logic and can be passed to the E2EPipelineFactory for use in the end-to-end pipeline.