How Vector Embeddings Work in RAG

How Vector Embeddings Work in RAG

1 Text → Vector (Fixed Dimensions)
multi-qa-mpnet model → 768 dimensions per text
cat
0.23, -0.45, 0.12, -0.08, 0.67, ...
768 numbers total
dog
0.18, -0.32, 0.08, 0.15, 0.59, ...
768 numbers total
car
-0.41, 0.22, -0.33, 0.54, -0.12, ...
768 numbers total
"vitamin C deficiency"
0.35, 0.61, -0.18, 0.42, 0.09, ...
768 numbers total
Key Point: Every text (word, sentence, paragraph) becomes exactly 768 numbers. The dimension count is fixed by the model, not by your data. 1 word or 1 million documents → same 768 dimensions each.
2 Cosine Similarity (Comparing Vectors)
cat vs dog
0.66
Similar (both animals)
Both vectors point in similar directions
in 768-dimensional space
cat vs smartphone
0.24
Different (unrelated concepts)
Vectors point in different directions
in 768-dimensional space
3 RAG System Flow
📄
Index Documents
Chunk your medical documents and convert each chunk to a 768-dim vector
1000 chunks → 1000 vectors stored
User Query
Convert the user's question to a vector using the same model
"scurvy symptoms" → [0.35, ...]
🔍
Vector Search
Find chunks with highest cosine similarity to query vector
Top 5 chunks: 0.89, 0.84, 0.81...
🤖
Generate Answer
Pass retrieved chunks as context to LLM for answer generation
LLM + context → accurate answer

💡 Why This Works

The embedding model learned from billions of text examples to place semantically similar text at nearby positions in 768-dimensional space. So "scurvy" and "vitamin C deficiency" end up close together, even though they share no words. That's why semantic search beats keyword matching.