Knowledge Base
Glossary

Glossary Summary

This glossary provides concise definitions for key terms related to machine learning (ML), programming in Python, database and vector storage technologies, and concepts central to Retrieval-Augmented Generation (RAG) pipelines.

Retrieval-Augmented Generation (RAG) Pipeline Terms

  • Adapter Pattern: Enables interaction between incompatible interfaces by data transformation.
  • Metadata & Session: Information about data and individual database connections.
  • Adapter Context & Chunking: Context for data transformation and dividing data for processing.
  • Normalize Embeddings: Process of scaling vectors to unit norm for consistent similarity metrics.

Machine Learning Terms

  • Embeddings: Numerical vectors representing data (text, images) to capture similarity.
  • Sentence Transformers: Library for dense vector representations of text for semantic search and similarity.

Database and Vector Storage Terms

  • PostgreSQL: Open-source relational database for extensible SQL operations.
  • pgvector: Extension for storing/searching high-dimensional vectors in PostgreSQL.
  • HNSW & IVFFlat: Algorithms for efficient nearest neighbor search in vector data.
  • Upsert & Index: Operations and structures for data insertion/update and speedy retrieval.

Programming and Python Concepts

  • Generator: Iterable in Python using yield for lazy item generation.
  • Context Manager: Manages resources (files, database connections) with with statement.