Code Green - RAG in action - Blog Series part 4
Written by: Mahdi Ghazimoradi
In our previous blog, we explored the fundamentals of Retrieval Augmented Generation (RAG) and understanding its core concepts. Now, let’s dive deeper into the practical implementation aspects, specifically focusing on building RAG applications using LangChain4j. This powerful Java framework offers various RAG architectures and approaches that can significantly enhance your AI applications.
As part of our ongoing “Code Green” project to develop energy consumption optimization tools, understanding and implementing the right RAG approach is crucial. Our tools—both an IntelliJ IDEA plugin and a web application (which will be discussed in more detail in the next blog post)—rely on different RAG implementations to provide up-to-date recommendations about energy efficiency in software projects. By exploring different RAG options in LangChain4j, we aim to identify the most suitable approaches for our specific tools, ensuring they can effectively analyze code and provide actionable, context-aware optimization suggestions.
Whether you’re looking to implement basic document retrieval or create more sophisticated multi-vector RAG systems, this guide will walk you through the different types of RAG implementations available in LangChain4j and how to choose the right approach for your specific use case.
Langchain4J
Langchain4J provides three distinct approaches to implementing RAG:
1. Easy RAG
- Definition: The simplest way to implement RAG, ideal for cases where simplicity and quick results matter.
- Characteristics:
- Retrieval: Uses pre-built retrievers that fetch relevant documents using embeddings or basic search mechanisms.
- Integration: Minimal configuration to combine retrieval and generation.
- Complexity: Low complexity; suitable for straightforward use cases.
- Pros:
- Quick and easy to implement.
- No need for sophisticated ranking or multiple sources.
- Cons:
- Limited flexibility and optimization options.
- Retrieval results may not be highly accurate for complex queries.
- Use Cases:
- Small-scale applications with localized datasets.
- Prototyping or testing retrieval-augmented workflows.
2. Naive RAG
- Definition: Combines retrieval and generation but in a more basic manner without advanced refinement.
- Characteristics:
- Retrieval: Retrieves documents based on embeddings or keyword-based search.
- Processing: Fetches documents from a single source without additional ranking or filtering.
- Output: Directly uses retrieved documents as input to the generator.
- Pros:
- Easy to implement for slightly more advanced setups compared to Easy RAG.
- Useful for small to medium-sized datasets.
- Cons:
- Struggles with multiple sources or ambiguous queries.
- No advanced scoring or re-ranking of retrieved documents.
- Use Cases:
- Single-source document retrieval systems.
- Early-stage development of RAG pipelines.
3. Advanced RAG
- Definition: A more powerful and customizable approach that involves multiple retrieval sources, advanced scoring, and re-ranking mechanisms.
- Characteristics:
- Retrieval: Fetches documents from multiple sources using advanced techniques like dense embeddings or hybrid search.
- Ranking: Implements re-ranking to score and prioritize retrieved documents.
- Augmentation: May include filtering, summarization, or domain-specific preprocessing of retrieved content.
- Feedback Loop: Allows iterative improvement of retrieval and generation through user feedback.
- Pros:
- High relevance and accuracy in responses.
- Flexible and scalable for large, multi-source datasets.
- Well-suited for domain-specific knowledge systems.
- Cons:
- More complex to implement and configure.
- Requires significant resources for scaling.
- Use Cases:
- Enterprise-grade search systems.
- Applications requiring integration with multiple, diverse data sources.
Summary Table
Feature | Easy RAG | Naive RAG | Advanced RAG |
---|---|---|---|
Retrieval | Pre-built retrievers | Basic embeddings, single-source | Multi-source, advanced embeddings |
Ranking | None | Basic (first N results) | Advanced re-ranking and scoring |
Complexity | Minimal | Low to moderate | High |
Scalability | Local/small-scale | Small to medium | Large-scale, multi-source |
Use Case | Quick setups, prototypes | Simple retrieval pipelines | Robust production applications |
Easy RAG is straightforward to implement where as Naive RAG offers a few more options in terms of it’s implementation and use. So we’ll go through both of those in practical examples, followed by a more in-depth look at Advanced RAG. This is because there are many different ways to implement advanced RAG. There is no “one-way” to do advanced RAG; how you implement it is based on what exactly your situation calls for, you’ll see what I mean once we get there.
Implementation
In the following examples, we will use Langchain4J in Java to implement a retrieval-augmented generation (RAG) system. You can use different AI models, such as OpenAI models, local models, or others. In this case, we will focus on using local models with Ollama. Before starting any implementation, ensure that a preferred AI model is running locally via Ollama. To get started, visit Ollama’s website, install the Ollama software, download your preferred model (in the following examples, we will use ’llama3.1’ as the main model and ’nomic-embed-text’ as the embedding model), and follow the instructions provided on the site to run it locally.
Dependencies
For the following examples, we will use the two maven dependencies listed below:
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-easy-rag</artifactId>
<version>0.36.2</version>
</dependency>
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-ollama</artifactId>
<version>0.36.2</version>
</dependency>
Easy RAG
This example demonstrates how to implement an “Easy RAG” (Retrieval-Augmented Generation) application. By “easy” we mean that we won’t dive into all the details about parsing, splitting, embedding, etc. All the “magic” is hidden inside the “langchain4j-easy-rag” module.
Easy RAG Steps:
- Take the user’s query as-is.
- Create a simple Content Retriever from the Embedding Store and next steps happen automatically with minimum configurations.
- The query will be used to search an embedding store (containing small segments of your documents) for the X most relevant segments.
- The founded segments will be appended to the user’s query.
- The combined input (user query + segments) will be sent to the LLM.
Let’s walk through a practical example of Easy RAG:
public class EasyRAGExample {
public static void main(String[] args) {
// creates some documents for us from the given input string
List<Document> documents = List.of(
Document.from("Latest version of ABC programming language is 23 and it is more energy efficient than previous versions."),
Document.from("Latest version XYZ framework is 3.3.5 and it is more energy efficient than previous versions.")
);
// stores embeddings locally in memory
InMemoryEmbeddingStore<TextSegment> embeddingStore = new InMemoryEmbeddingStore<>();
// takes raw text documents and converts them into embeddings, which are stored in the embedding store
EmbeddingStoreIngestor.ingest(documents, embeddingStore);
// configures a locally hosted AI model (e.g., Llama 3.1), running on `http://localhost:11434`
ChatLanguageModel chatModel = OllamaChatModel.builder()
.baseUrl("http://localhost:11434")
.modelName("llama3.1")
.build();
Assistant assistant = AiServices.builder(Assistant.class) // connects the chat model and content retriever
.chatLanguageModel(chatModel)
.contentRetriever(EmbeddingStoreContentRetriever.from(embeddingStore)) // retrieves the most relevant documents from the embedding store based on the user's query, enabling the model to respond with contextualized and semantically relevant answers
.build();
System.out.println(assistant.chat("What are the latest versions of ABC and XYZ?"));
}
interface Assistant {
String chat(String userMessage);
}
}
Here is example feedback from the model:
Here are the latest versions:
ABC: Version 23
XYZ: Version 3.3.5
Additional Resources
In the upcoming examples, we’ll utilize an additional file that provides extra context for imaginary scenarios. Before exploring Naive and Advanced RAG methods, save the file Green_Secure_Shield.txt (the terms and conditions of the fictional Green Secure Shield Insurance) in a designated location, such as the resources folder, for future reference.
Naive RAG
This example demonstrates how to implement a Naive Retrieval-Augmented Generation (RAG) application. By “naive”, we mean that we won’t use any advanced RAG techniques. In each interaction with the Large Language Model (LLM), we will:
- Take the user’s query as-is.
- Embed it using an embedding model.
- Use the query’s embedding to search an embedding store (containing small segments of your documents) for the X most relevant segments.
- Append the found segments to the user’s query.
- Send the combined input (user query + segments) to the LLM.
- This approach assumes that:
- The user’s query is well-formulated and contains all necessary details for retrieval.
- The found segments are relevant to the user’s query.
public class NaiveRAGExample {
public static final String OLLAMA_URL = "http://localhost:11434";
public static final String MODEL_NAME = "llama3.1";
public static final String EMBEDDING_MODEL_NAME = "nomic-embed-text";
public static void main(String[] args) {
// is used to load a document that we wanted to use for RAG. We used the terms of use from an imaginary insurance company, "Green Secure Shield Insurance"
Document document = FileSystemDocumentLoader.loadDocument(Path.of("src/main/resources/Green_Secure_Shield.txt"));
// is used to split this document into smaller segments, also known as "chunks". This approach allows us to send only relevant segments to the LLM in response to a user query, rather than the entire document
DocumentSplitter splitter = DocumentSplitters.recursive(300, 0);
List<TextSegment> segments = splitter.split(document);
// is used to embed (or "vectorize") document segments, a process essential for performing similarity searches. In this example, the embedding model (nomic-embed-text) generates embeddings for document text
OllamaEmbeddingModel embeddingModel = OllamaEmbeddingModel.builder()
.baseUrl(OLLAMA_URL)
.modelName(EMBEDDING_MODEL_NAME)
.build();
List<Embedding> embeddings = embeddingModel.embedAll(segments).content();
// is used to hold embeddings locally
EmbeddingStore<TextSegment> embeddingStore = new InMemoryEmbeddingStore<>();
embeddingStore.addAll(embeddings, segments);
// retrieves the top documents from the embedding store based on semantic similarity to the query
ContentRetriever contentRetriever = EmbeddingStoreContentRetriever.builder()
.embeddingStore(embeddingStore)
.embeddingModel(embeddingModel)
.maxResults(2) // limits the number of retrieved documents to the top 2
.minScore(0.5) // sets a similarity threshold of 0.5 to filter low-relevance results
.build();
// configures a locally hosted AI model (e.g., Llama 3.1), running on `http://localhost:11434`
ChatLanguageModel chatModel = OllamaChatModel.builder()
.baseUrl(OLLAMA_URL)
.modelName(MODEL_NAME)
.build();
// connects the chat model and content retriever
Assistant assistant = AiServices.builder(Assistant.class)
.chatLanguageModel(chatModel)
.contentRetriever(contentRetriever)
.build();
System.out.println(assistant.chat("I'm under 18, can i use Green Secure Shield Insurance Company insurances?"));
}
interface Assistant {
String chat(String userMessage);
}
}
Here is example feedback from the model:
Unfortunately, based on the terms and conditions provided, it appears that you are not eligible to use Green Secure Shield Insurance Company's insurance services because you are under 18 years old.
According to section 2.1, their insurance products are available only to individuals aged 18 or older.
Advanced RAG
Advanced RAG can be implemented with LangChain4j with the following core components:
- QueryTransformer
- QueryRouter
- ContentRetriever
- ContentAggregator
- ContentInjector
The process is as follows:
- The user produces a UserMessage, which is converted into a Query.
- The QueryTransformer transforms the Query into one or multiple Querys.
- Each Query is routed by the QueryRouter to one or more ContentRetrievers.
- Each ContentRetriever retrieves relevant Contents for each Query.
- The ContentAggregator combines all retrieved Contents into a single final ranked list.
- This list of Contents is injected into the original UserMessage.
- Finally, the UserMessage, containing the original query along with the injected relevant content, is sent to the LLM.
It is a general Advanced RAG flow, but as we mentioned earlier there are different ways to implement Advanced RAG, such as:
- Query Compression
- Re-Ranking
- Including Metadata
- Metadata Filtering
- Web Search
- Query Routing
Query Routing
Private data exists in various sources like Confluence, Git, databases, and search engines. In a multi-source RAG setup, sending all queries to every ContentRetriever is inefficient. Query routing optimizes this by directing queries to the most relevant retriever using:
- Rules (e.g., user privileges, location).
- Keywords (e.g., specific terms trigger specific retrievers).
- Semantic similarity (using embeddings).
- An LLM to decide routing.
For methods 1–3, use a custom QueryRouter; for method 4, use a LanguageModelQueryRouter.
As you see there are different implementations of Advanced RAG, also you can customize or combine them. In the following example, we will implement Query Routing. Here is the file John_Doe_Biography.txt (An Imaginary biography) you’ll need to add to the main resources directory of your project.
public class AdvancedRAGExample {
public static final String OLLAMA_URL = "http://localhost:11434";
public static final String MODEL_NAME = "llama3.1";
public static final String EMBEDDING_MODEL_NAME = "nomic-embed-text";
public static void main(String[] args) {
// is used to embed (or "vectorize") document segments, a process essential for performing similarity searches
OllamaEmbeddingModel embeddingModel = OllamaEmbeddingModel.builder()
.baseUrl(OLLAMA_URL)
.modelName(EMBEDDING_MODEL_NAME)
.build();
EmbeddingStore<TextSegment> biographyEmbeddingStore =
embed(Path.of("src/main/resources/John_Doe_Biography.txt"), embeddingModel);
// retrieves the top documents from the embedding store based on semantic similarity to the query
ContentRetriever biographyContentRetriever = EmbeddingStoreContentRetriever.builder()
.embeddingStore(biographyEmbeddingStore)
.embeddingModel(embeddingModel)
.maxResults(2) // limits the number of retrieved documents to the top 2.
.minScore(0.6) // sets a similarity threshold of 0.6 to filter low-relevance results.
.build();
EmbeddingStore<TextSegment> termsAndConditionsEmbeddingStore =
embed(Path.of("src/main/resources/Green_Secure_Shield.txt"), embeddingModel);
ContentRetriever termsAndConditionsContentRetriever = EmbeddingStoreContentRetriever.builder()
.embeddingStore(termsAndConditionsEmbeddingStore)
.embeddingModel(embeddingModel)
.maxResults(2)
.minScore(0.6)
.build();
// is configured to use the `llama3.1` model hosted at `http://localhost:11434` for answering user queries
ChatLanguageModel chatModel = OllamaChatModel.builder()
.baseUrl(OLLAMA_URL)
.modelName(MODEL_NAME)
.build();
Map<ContentRetriever, String> retrieverToDescription = new HashMap<>();
retrieverToDescription.put(biographyContentRetriever, "biography of John Doe");
retrieverToDescription.put(termsAndConditionsContentRetriever, "terms and conditions of insurance company");
// is configured to route queries to different retrievers based on the query context
QueryRouter queryRouter = new LanguageModelQueryRouter(chatModel, retrieverToDescription);
// is used as the entry point into the RAG flow in LangChain4j. It can be configured to customize the RAG behavior according to your requirements.
RetrievalAugmentor retrievalAugmentor = DefaultRetrievalAugmentor.builder()
.queryRouter(queryRouter)
.build();
// connects the chat model and content retriever
Assistant assistant = AiServices.builder(Assistant.class)
.chatLanguageModel(chatModel)
.retrievalAugmentor(retrievalAugmentor)
.build();
System.out.println(assistant.chat("What is the legacy of John Doe?"));
System.out.println(assistant.chat("I'm under 18, can i use Green Secure Shield Insurance Company insurances?"));
}
private static EmbeddingStore<TextSegment> embed(Path documentPath, EmbeddingModel embeddingModel) {
// is used to load a document that we wanted to use for RAG
Document document = FileSystemDocumentLoader.loadDocument(documentPath);
// is used to split this document into smaller segments, also known as "chunks"
DocumentSplitter splitter = DocumentSplitters.recursive(300, 0);
List<TextSegment> segments = splitter.split(document);
List<Embedding> embeddings = embeddingModel.embedAll(segments).content();
// is used to hold embeddings locally
EmbeddingStore<TextSegment> embeddingStore = new InMemoryEmbeddingStore<>();
embeddingStore.addAll(embeddings, segments);
return embeddingStore;
}
interface Assistant {
String chat(String userMessage);
}
}
Here is example feedback from the model:
Based on the provided text, I assume that John Doe is a public figure who has left a lasting impact on the world.
The legacy of John Doe can be summarized as follows:
* He was a pioneer in technology and literature.
* His contributions have inspired countless individuals to pursue their dreams.
* He was a dedicated philanthropist, establishing the "Doe Foundation" to support education and environmental initiatives.
* His efforts earned him various humanitarian awards.
Overall, John Doe's legacy is one of innovation, inspiration, and kindness.
No, according to section 2.1 of Green Secure Shield Insurance Company's terms and conditions, their insurance products are available only to individuals aged 18 or older.
Since you're under 18, you would not be eligible to use their insurance services.
Conclusion
RAG is a powerful tool that can be used as a solution to many different problems. It basically allows us to “add” some additional information to the AI model’s knowledge base. Using RAG enables you to: access and utilize up-to-date information, generate more accurate and factually correct responses and provide contextually relevant and informative outputs. Choosing the right RAG approach depends on your specific use case, available resources, and complexity of information retrieval needs.
For our Code Green optimization tools, we’ve implemented different RAG approaches based on the specific requirements of each tool. For our IntelliJ IDEA plugin, we chose the Naive RAG implementation which offers greater control over document processing and retrieval parameters, allowing for more targeted code analysis. Meanwhile, for our web application, we implemented Easy RAG due to its simpler setup and integration capabilities, which was ideal for providing quick recommendations through the web interface.
In our next blog post, we’ll explore these tools in detail, showing how the different RAG implementations are integrated and how they help analyze code, identify inefficiencies, and suggest optimizations to reduce energy consumption in software projects.
Suggestion:
- Start simple with Easy RAG.
- Progress to Naive RAG as your requirements grow.
- Consider Advanced RAG for the most demanding scenarios.
Remember, RAG is not a one-size-fits-all solution. Careful evaluation of your specific requirements will guide you to the most appropriate implementation.
Resources
[1] LangChain4j Documentation
[2] Ollama Website