How Vector Embeddings Work in RAG

1 Text → Vector (Fixed Dimensions)

multi-qa-mpnet model → 768 dimensions per text

cat →

0.23, -0.45, 0.12, -0.08, 0.67, ...

768 numbers total

dog →

0.18, -0.32, 0.08, 0.15, 0.59, ...

768 numbers total

car →

-0.41, 0.22, -0.33, 0.54, -0.12, ...

768 numbers total

"vitamin C deficiency" →

0.35, 0.61, -0.18, 0.42, 0.09, ...

768 numbers total

Key Point: Every text (word, sentence, paragraph) becomes exactly 768 numbers. The dimension count is fixed by the model, not by your data. 1 word or 1 million documents → same 768 dimensions each.

2 Cosine Similarity (Comparing Vectors)

cat vs dog

0.66

Similar (both animals)

Both vectors point in similar directions
in 768-dimensional space

cat vs smartphone

0.24

Different (unrelated concepts)

Vectors point in different directions
in 768-dimensional space

3 RAG System Flow

📄

Index Documents

Chunk your medical documents and convert each chunk to a 768-dim vector

1000 chunks → 1000 vectors stored

❓

User Query

Convert the user's question to a vector using the same model

"scurvy symptoms" → [0.35, ...]

🔍

Vector Search

Find chunks with highest cosine similarity to query vector

Top 5 chunks: 0.89, 0.84, 0.81...

🤖

Generate Answer

Pass retrieved chunks as context to LLM for answer generation

LLM + context → accurate answer

💡 Why This Works

The embedding model learned from billions of text examples to place semantically similar text at nearby positions in 768-dimensional space. So "scurvy" and "vitamin C deficiency" end up close together, even though they share no words. That's why semantic search beats keyword matching.