Default Configuration
During the example pipeline creation, a default config.json
is loaded and passed to the pipeline. It provides settings for the following services:
- Embedding settings
- Vector Database provider
- LLM settings
- Parsing logic
- Evaluation provider
- and more
The default values for the config are shown below:
{
"embedding": {
"provider": "openai",
"model": "text-embedding-ada-002",
"dimension": 1536,
"batch_size": 32,
"text_splitter": {
"type": "recursive_character",
"chunk_size": 512,
"chunk_overlap": 20
}
},
"evals": {
"provider": "deepeval",
"sampling_fraction": 1.0
},
"language_model": {
"provider": "litellm"
},
"logging_database": {
"provider": "local",
"collection_name": "demo_logs",
"level": "INFO"
},
"ingestion": {
"provider": "local"
},
"vector_database": {
"provider": "local",
"collection_name": "demo_vecs"
},
"app": {
"max_logs": 100,
"max_file_size_in_mb": 100
}
}
The available options for each section are:
-
embedding:
provider
:"openai"
,"sentence-transformers"
(see Embedding Providers for more info)model
: embedding model to use (e.g.,"text-embedding-ada-002"
for OpenAI)dimension
: embedding dimensionbatch_size
: number of texts to embed in each batch
-
evals:
provider
:"deepeval"
,"parea"
(see Evaluation Providers for more info)sampling_fraction
: sampling sampling_fraction for evaluation (e.g.,1.0
for 100% of requests)
-
language_model:
provider
:"litellm"
,"openai"
,"llama-cpp"
(see LLM Providers for more info)
-
logging_database:
provider
:"local"
,"postgres"
,"redis"
collection_name
: name of the collection to store logslevel
: logging level (e.g.,"INFO"
,"DEBUG"
)
-
ingestion:
provider
:"local"
text_splitter
: configuration for text splittingtype
:"recursive_character"
chunk_size
: target chunk size in characterschunk_overlap
: number of overlapping characters between chunks
-
vector_database:
provider
:"local"
,"pgvector"
,"qdrant"
(see Vector Database Providers for more info)collection_name
: name of the collection to store vector embeddings
-
app:
max_logs
: maximum number of logs to retrievemax_file_size_in_mb
: maximum file size allowed for ingestion (in MB)
For more detailed information on configuring specific providers, please refer to the relevant documentation:
These documentation files provide in-depth guides on setting up and using each provider, including any required environment variables and additional configuration options.
To launch the default pipeline with your own config, you may run the following code:
class E2EPipelineFactory:
...
app = E2EPipelineFactory.create_pipeline(
# override with your own config.json
config=R2RConfig.load_config("your_config_path.json"),
)