RAG

Retrieval-Augmented Generation (RAG): LLMs retrieve relevant chunks from uploaded documents at each query.The Kaparthy LLM Wiki

Characteristics

  • Knowledge rediscovered from scratch every question.
  • No accumulation or persistence.
  • Examples: NotebookLM, ChatGPT file uploads, most RAG systems.

Contrast with LLM Wiki

  • RAG: Ephemeral retrieval.
  • LLM Wiki: Proactive maintenance of persistent wiki.

From Retrieval-Augmented Generation

  • RAG optimizes LLM output by referencing authoritative knowledge bases outside training data without retraining.
  • Solves challenges like hallucinations, static knowledge cut-offs, and lack of source attribution.
  • Pipelines involve creating external data (embeddings), retrieving relevant information (vector search), and augmenting the LLM prompt.
  • Cost-effective alternative to fine-tuning for injecting domain-specific or real-time data.