Glossary Summary
This glossary provides concise definitions for key terms related to machine learning (ML), programming in Python, database and vector storage technologies, and concepts central to Retrieval-Augmented Generation (RAG) pipelines.
Retrieval-Augmented Generation (RAG) Pipeline Terms
- Adapter Pattern: Enables interaction between incompatible interfaces by data transformation.
- Metadata & Session: Information about data and individual database connections.
- Adapter Context & Chunking: Context for data transformation and dividing data for processing.
- Normalize Embeddings: Process of scaling vectors to unit norm for consistent similarity metrics.
Machine Learning Terms
- Embeddings: Numerical vectors representing data (text, images) to capture similarity.
- Sentence Transformers: Library for dense vector representations of text for semantic search and similarity.
Database and Vector Storage Terms
- PostgreSQL: Open-source relational database for extensible SQL operations.
- pgvector: Extension for storing/searching high-dimensional vectors in PostgreSQL.
- HNSW & IVFFlat: Algorithms for efficient nearest neighbor search in vector data.
- Upsert & Index: Operations and structures for data insertion/update and speedy retrieval.
Programming and Python Concepts
- Generator: Iterable in Python using
yield
for lazy item generation. - Context Manager: Manages resources (files, database connections) with
with
statement.