Retrieval-Augmented Generation (RAG) Architecture

    Transformer Architecture
    |
    ├─ Encoder-Only (BERT, RoBERTa)
    │  └─ Embedding Models ← RAG embeddings
    │     ├─ MiniLM (384 dims)
    │     ├─ multi-qa-mpnet (768 dims)
    │     └─ OpenAI ada-002 (1536 dims)
    |
    ├─ Decoder-Only (GPT, Claude, LLaMA)
    │  └─ Large Language Models (LLMs)
    |
    └─ Encoder-Decoder (T5, BART)


Text Embeddings - Cosine Similarity Calculation

similarity       = 1 - cosine_distance
MODEL_NAME = "sentence-transformers/all-MiniLM-L6-v2"
model      = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

# VECTOR_DIMENSION = 384 is fixed property of the "all-MiniLM-L6-v2" model
# Each text input will be converted into a vector with exactly 384 numbers
# ANY text - single word, a sentence, a paragraph, or a chunk of text will 
# be converted into exactly 384 numbers by the model.
VECTOR_DIMENSION = 384

# Models and their dimensions
# "all-MiniLM-L6-v2"                      -> 384 dimensions
# "all-mpnet-base-v2"                     -> 768 dimensions
# "all-MiniLM-L12-v2"                     -> 384 dimensions
# "paraphrase-multilingual-MiniLM-L12-v2" -> 384 dimensions


Document Ingestion Techniques for Machine Learning RAG
(Retrieval-Augmented Generation)

Various techniques for ingesting documents into a Retrieval-Augmented Generation (RAG) system. RAG combines the strengths of pre-trained language models (LLMs) with the ability to retrieve relevant information from external knowledge sources. Effective document ingestion is critical for RAG system performance.

Understanding the Document Ingestion Pipeline

The ingestion pipeline typically consists of these stages:

  1. Loading: Retrieving documents from various sources.
  2. Preprocessing: Cleaning, structuring, and preparing the document text.
  3. Chunking: Dividing the document into smaller, manageable pieces (chunks).
  4. Embedding: Creating vector representations (embeddings) of each chunk.
  5. Indexing: Storing the embeddings and associated metadata in a vector database.
  6. Retrieval: Querying the vector database to find relevant chunks.

1. Document Loading Techniques

2. Preprocessing Techniques

3. Chunking Strategies Visualize LLM Tokens

Chunking is arguably the most critical aspect of document ingestion. The size and nature of chunks dramatically affect retrieval performance.

4. Embedding Techniques