Author – Cen Lu
Imagine you would like to write an essay about quantum computing, but your knowledge about quantum computing comes only from your high school textbooks. In this case, you know how to write good papers, but your knowledge is limited. Now imagine if you could access any library in the world while writing. That would make your work easier, and it’s essentially what retrieval-augmented generation (RAG) does for large language models (LLMs).
The Knowledge Problem: When Smart Isn’t Enough
LLMs like ChatGPT are extremely impressive, since they can write, reason and even code. But they also have a severe limitation: they only “know” what they learned during training. Just like our student trying to write about quantum computing, an LLM has impressive reasoning abilities but limited knowledge. It’s like having a brilliant guy who went into a coma in 2023 and just woke up. This person is indeed smart, but he doesn’t know about anything that happened while he was unconscious. For example, if you asked an LLM trained only on data until 2023 about the newest quantum processor, it simply wouldn’t know. Even worse, LLMs can sometimes “hallucinate”. You probably have experienced this when using ChatGPT: it can state facts that sound plausible but are completely wrong.
Enter RAG: When Retrieval Meets Generation
RAG solves this by giving LLMs access to external knowledge bases in real-time. Going back to our student example, RAG is like giving them access to the latest quantum computing journals, research papers and textbooks right when they need them. Think of it as equipping your LLMs with an encyclopedia in which it can instantly find relevant information. The name explains everything: Retrieval means finding relevant information from external sources, Augmented represents boosting the AI’s internal knowledge by the retrieved relevant context and Generation is to create responses using both retrieved information and the AI’s inherent capabilities.
RAG systems work through a two-step process that mirrors how humans do research. Step 1 is the search phase (retrieval). When you ask a question, the LLMs don’t immediately start generating an answer. Instead, it needs to know: “What information do I need to answer the question well?”. It first analyses your question to understand what type of information is needed. For instance, if you ask: “What are the latest quantum processors in 2025?“, the RAG system first needs to gather accurate information about recent processor releases from companies. Then it searches in the knowledge base that you requested it to look at (e.g., in other quantum computing educational material and scientific papers) using vector similarity (see “How AI Learns to “Read” Like Humans”). Finally, the found documents are ranked by how relevant they are to your question. The simplified formula is:
Similarity Score = cosine_similarity(query_vector, document_vector)
With this formula, LLMs are actually capable of getting what you’re asking about. So when you search for “quantum computing applications”, LLMs with RAG are “smart” enough to pull up articles about quantum cryptography, quantum machine learning or quantum simulation too, because they know these topics are highly relevant.
Then comes the magic in Step 2, the generation part. This is where the LLMs take information from different sources and combine it all together into something that actually makes sense. It’s not just copy-pasting, it’s attempting to create a thoughtful, coherent response that draws from everything it found. For example, when asked about the latest quantum processors in 2025, the RAG system might retrieve information from papers and articles from Google, Apple or Microsoft. It then synthesises this information to provide a comprehensive overview.
What’s really cool about modern RAG systems is that they’ve gotten pretty good at fact-checking themselves. They can cross-reference information from multiple sources, spot when things don’t add up and even tell you where they got their information from. It’s like having a research assistant who not only finds great sources but also keeps track of where everything came from, and that means no more weird claims without references!
Why RAG Matters: Better Answers, Less Guessing
So, why should we care about RAG? Because it makes LLMs more trustworthy. Without RAG, LLMs might give you outdated info or make something up if it doesn’t know the answer. With RAG, it’s like having an expert who double-checks the facts before explaining something to you. For example, if you ask about a new movie that came out last week, a regular LLM might not know anything about it if it wasn’t in its training data. But LLMs with RAG can find reviews or news articles and tell you what’s up.
What’s Next for RAG?
RAG is already capable, but it’s getting even better. Developers are working on making it faster and smarter, so LLMs can search bigger libraries of information without slowing down. They’re also teaching RAG systems to better understand complex questions, like “How does climate change affect small towns in Europe?”. That’s not just one topic, it’s a mix of science, geography and local issues. RAG is learning to connect all those points.
Finally, as a famous phrase goes, intelligence is not the ability to store information, but the ability to know where to find it. We’re not building walking encyclopedias; instead, we’re creating LLM experts who can connect knowledge and facts to answers, shifting from memorisation to providing references and from hallucination to accuracy.
References:
Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., & Wang, H. (2023). Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997.
Izacard, G., & Grave, E. (2021). Leveraging passage retrieval with generative models for open domain question answering. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics (pp. 874–880).
Karpukhin, V., Oğuz, B., Min, S., Lewis, P., Wu, L., Edunov, S., & Yih, W. T. (2020). Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (pp. 6769–6781).
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in Neural Information Processing Systems, 33, 9459–9474.
Petroni, F., Rocktäschel, T., Lewis, P., Bakhtin, A., Wu, Y., Miller, A. H., & Riedel, S. (2019). Language models as knowledge bases? In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (pp. 2463–2473).
Qu, Y., Ding, Y., Liu, J., Liu, K., Ren, R., Zhao, W. X., & Wen, J. R. (2021). RocketQA: An optimized training approach to dense passage retrieval for open-domain question answering. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics (pp. 5835–5847).
Further reading/watching/listening:
Videos & Podcasts:
Lewis, P. (2020, October 22). #100 Dr. PATRICK LEWIS – Retrieval Augmented Generation: https://www.youtube.com/watch?v=Dm5sfALoL1Y.
Image Attribution
Generated by: DALL-E3
Date: 25/06/2025
Prompt: “Modern office scene with a person asking a question to an AI assistant on a computer screen. Behind the AI, show a visual representation of knowledge retrieval-clear floating document icons, database symbols and search beams connecting to a large digital library in the background. Clean, professional illustration style with blue and green color scheme.”