Retrieval-Augmented Generation (RAG) Workflow
Standard Workflow of RAG
-
User Prompt
- The system receives a user query or prompt as input.
-
Embedding Generation
- The user prompt is tokenized and transformed into a vector representation using an embedding model (e.g., Sentence-BERT or a transformer encoder).
- This vector captures the semantic meaning of the query.
-
Similarity Search in FAISS
- The query vector is compared against stored vectors in the FAISS index to find the most similar documents based on semantic similarity (e.g., cosine similarity or Euclidean distance).
- FAISS returns the top-N most relevant results, which include document IDs and their similarity scores.
-
Retrieval of Raw Text
- The document IDs returned by FAISS are used to fetch corresponding raw text or content from an external database or data store (e.g., Elasticsearch, MongoDB).
-
Contextual Generation
- The retrieved text and the original user prompt are fed into the generative model (e.g., T5, GPT).
- The model conditions its output on the combined input of the query and retrieved documents to produce the final response.
Why Similarity Search Comes First
- Semantic Matching: Ensures the generative model is provided with semantically relevant content, enhancing response quality.
- Efficiency: Narrows down large datasets to a smaller set of relevant documents, optimizing processing.
When Text Search Might Be Used
- Supplementary Retrieval: May complement the vector-based search to apply specific filters or search for keywords.
- Hybrid Models:
- Combines vector similarity search with full-text search for a balance between semantic and lexical matching.
- Refines or filters results based on complex conditions after the initial vector search.
Conclusion
- Primary Approach: RAG begins with a similarity search against a vector database (e.g., FAISS) to identify semantically relevant documents.
- Text Search: Used as a complementary step for more complex retrieval strategies or filtering.