How do RAG (Retrieval-Augmented Generation) systems work with LLMs?
Hugging Face Forums [Unofficial]
May 20, 2026
As I’m sure you know, RAG is a way of expanding an LLMs effective knowledge without further training. A RAG system is a program that creates a vectorized database of relevant documents. That vectorization is key since it allows semantic searching through that database.
There are two main ways that a RAG system can work with an LLM. The first works through blind context injection and the second involves tool calls
1. Blind context injection: The user’s prompt is vectorized and matched with vectors in the RAG database. Relevant material is surfaced and directly injected into the model’s context. It’s quick and easy but the model doesn’t always get the context it needs.
2. Tool call method: The model has a RAG tool that it can call with search words as arguments. This enables it to do semantic search over documents that you give it. This method requires the model to have strong agentic tool calling capabilities but allows the model to make sure it gets the exact context it needs.
Here’s youtube video I watched a while ago on this subject: https://www.youtube.com/watch?v=UabBYexBD4k
Discussion in the ATmosphere