24 February 2025

Code Green - What is RAG? - Blog Series part 3

Written by: Mahdi Ghazimoradi

In the previous blog in this series, we discussed how an LLM works, specifically within the context of a GPT. In this blog, we’ll be taking a closer look at Retrieval Augmented Generation in an attempt to answer questions such as: “What is it?”, “When to use it?” and “How do I implement it?” RAG is important here because models often lack awareness of energy optimization techniques or the latest updates. When a model needs to provide guidance on energy efficiency, RAG helps bridge these gaps by integrating relevant external knowledge.

What is Retrieval Augmented Generation (RAG)

In short, RAG is a modified way in which an AI service can produce a response. With a normal GPT, it generates a response based on the AI’s understanding of the question you asked. With RAG, relevant information is first retrieved from documents before being integrated into the input-prompt (your original question, also referred to as the query).

RAG is a process that combines both retrieval-based models and generative transformer models for tasks like question answering and language modeling. Let’s unpack this a little before moving on.

A retrieval-based model does not generate responses like ChatGPT does. Instead, it selects the correct response from a set of predefined responses. Within the context of RAG, this process works as follows:

Understand the query: The query is converted into an embedding vector (a dense numerical representation) using a model pre-trained on a large corpus of text.
Understand the document: Each document in the dataset is also converted into a dense vector using the same method.
Check documents for relevance: The model calculates similarity scores between the query vector and each document vector.
Modify the input prompt: The documents (or sections of documents) with the highest scores are deemed the most relevant to the query.
Send the prompt to the GPT: These are then supplied as additional context to the transformer model (like GPT), which generates the output response.

Why RAG is Important

Without RAG, an AI model might produce semantically different responses for queries that are only slightly different. Two similar (but not identical) questions can result in vastly different answers. This is a problem for complex queries where clear answers can be found in existing documents.

The RAG process enables models to utilize a collection of documents to form their responses. It involves two key steps:

Document Retrieval: A retriever model fetches relevant documents or passages from a large corpus based on the query.
Answer Generation: The retrieved documents are used as input to a transformer model (like GPT) to generate an answer based on both the initial query and the retrieved content.

This process allows models to produce coherent and consistent responses grounded in a knowledge base, making it especially useful for large-scale, real-world scenarios where AI needs to reference extensive information.

Knowing when to implement RAG

RAG is powerful, but it’s not a one-size-fits-all solution. Knowing when to implement it depends on the problem you’re solving. RAG fills knowledge gaps by leveraging additional datasets. But do you have access to sufficient additional data? Is it structured and relevant?

Where RAG would not shine

Small datasets: If the dataset is small and can be included in the model’s training data, RAG may be unnecessary. Simple transformer models might suffice.
Low complexity tasks: If a chatbot only needs to answer predefined questions about a company’s products, a retrieval-based model could work just as well.
Low budget, high demand applications: RAG is computationally expensive. If speed and cost efficiency are priorities, other approaches might be preferable.
Creative generative tasks: RAG is best for factual accuracy. If the task involves creativity, such as poetry generation, standard generative models might be more suitable.

Where RAG would be an ideal solution

RAG is useful in:

Customer Support Chatbots: It allows chatbots to retrieve and integrate relevant information dynamically, making responses more accurate and context-aware.
Content Generation: RAG can retrieve the latest documentation and generate relevant content. This is useful for keeping programming advice up to date when dealing with fast-evolving libraries.
Medical and Legal AI Systems: These fields require highly accurate and up-to-date information retrieval, making RAG an essential tool for ensuring reliable responses.
Code Optimization Assistants: using RAG (Retrieval-Augmented Generation) allows for the dynamic retrieval of the latest insights and techniques to continuously improve system performance. By integrating RAG into the assistant, it can access up-to-date knowledge from various sources, such as performance best practices, documentation, and optimization strategies, ensuring that the assistant always provides the most relevant and effective recommendations. This approach enables the assistant to stay current with evolving technologies and methodologies, allowing developers to easily optimize their code, reduce resource consumption, and enhance overall efficiency without manual updates or reconfiguration.

How to implement RAG

Implementing RAG involves:

Train a retriever model: Responsible for fetching relevant documents.
Prepare a document corpus: Documents are converted into embedding vectors and stored in a vector database for efficient retrieval.
Encode the query: Convert the input question into an embedding vector.
Retrieve relevant documents: Find the documents with the highest similarity scores.
Modify the input prompt: Integrate retrieved documents into the input query.
Generate the response: The modified query is fed into a transformer model to produce a final response.

Choosing the right tools depends on factors like the programming language and available libraries. In Python, popular choices for implementing RAG include Haystack, and LangChain, while in Java, LangChain4j is a popular option.

Conclusion

Retrieval Augmented Generation represents a significant advancement in how AI systems can leverage existing knowledge bases to provide more accurate and contextually relevant responses. Throughout this blog post, we’ve explored what RAG is and when to use it.

The key takeaway is that RAG isn’t just another AI buzzword – it’s a practical solution for situations where you need to combine the generative capabilities of large language models with specific, retrievable knowledge. As we continue to see rapid developments in AI technology, RAG will likely play an increasingly important role in building more reliable and accurate AI applications. Its ability to ground AI responses in specific documentation and knowledge bases makes it particularly valuable for enterprise applications where accuracy and consistency are paramount.

What Lies Ahead

In our next blog post, we’ll dive deeper into practical implementations and explore some examples of RAG in action. Stay tuned!

The Project “Code Green - Sustainable Software Development” is supported by the SIDN Fonds. Read more about SIDN Fonds and Project Code Green at www.sidnfonds.nl/projecten/project-code-groen

Mahdi Ghazimoradi

Mahdi is an experienced Software Engineer specializing in scalable microservices, legacy system modernization, and real-time data analysis. He excels at optimizing workflows, accelerating integrations, and delivering impactful projects across various domains. Proficient in Java, Kotlin, Spring Boot, SQL and NoSQL databases, and messaging systems, Mahdi focuses on enhancing system performance, improving test coverage, and building maintainable, efficient applications. His work significantly boosts scalability, reusability, and readability. Outside of work, he enjoys tackling challenging technical projects to stay sharp and explore new technologies. When he’s not coding, you’ll find him cycling, exploring innovative ideas, watching movies, or listening to music and podcasts.