Natural Language Processing (NLP): LLMs, RAG, FAISS, Embeddings


Machine Learning


Processing Data Files for Large Language Models (LLMs)


Text Embeddings - Cosine Similarity Calculation

similarity       = 1 - cosine_distance
MODEL_NAME       = "sentence-transformers/all-MiniLM-L6-v2"
model            = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

# VECTOR_DIMENSION = 384 is fixed property of the "all-MiniLM-L6-v2" model
# Each text input will be converted into a vector with exactly 384 numbers
# ANY text - single word, a sentence, a paragraph, or a chunk of text will 
# be converted into exactly 384 numbers by the model.
VECTOR_DIMENSION = 384

# Models and their dimensions
# "all-MiniLM-L6-v2"                      -> 384 dimensions
# "all-mpnet-base-v2"                     -> 768 dimensions
# "all-MiniLM-L12-v2"                     -> 384 dimensions
# "paraphrase-multilingual-MiniLM-L12-v2" -> 384 dimensions



Natural Language Understanding (NLU)