Natural Language Processing (NLP): LLMs, RAG, FAISS, Embeddings
Machine Learning
Processing Data Files for Large Language Models (LLMs)
Text Embeddings - Cosine Similarity Calculation
similarity = 1 - cosine_distance
MODEL_NAME = "sentence-transformers/all-MiniLM-L6-v2"
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
# VECTOR_DIMENSION = 384 is fixed property of the "all-MiniLM-L6-v2" model
# Each text input will be converted into a vector with exactly 384 numbers
# ANY text - single word, a sentence, a paragraph, or a chunk of text will
# be converted into exactly 384 numbers by the model.
VECTOR_DIMENSION = 384
# Models and their dimensions
# "all-MiniLM-L6-v2" -> 384 dimensions
# "all-mpnet-base-v2" -> 768 dimensions
# "all-MiniLM-L12-v2" -> 384 dimensions
# "paraphrase-multilingual-MiniLM-L12-v2" -> 384 dimensions
Natural Language Understanding (NLU)