Advanced Retrieval with LlamaPacks: Elevating RAG in Fewer Lines of Code!

LlamaIndex Introduces an Impressive Suite of RAG Power Tools!

Ashhadul Islam

Python in Plain English

· ~7 min read · December 7, 2023 (Updated: December 8, 2023) · Free: No

In this experiment I am trying out 5 RAG methods open-sourced by LlamaIndex. They profess that they have reduced the RAG activity to ~1 line of code. So let us test that. Also we are going to compare among them to see which one gets the most accurate result.

TL; DR: Auto Merging Retriever Pack and Small-to-big Retrieval Pack provided by LlamaIndex perform the best for this experiment. Read on to see what went down.

Note: We are using OpenAI APIs for this language generation part. The code is available as a google colab notebook.

Installing the libraries

pip install llama-index llama-hub rank-bm25
pip install pypdf

Step 2: Loading the data

import nest_asyncio
nest_asyncio.apply()
from llama_index import SimpleDirectoryReader
from llama_index.node_parser import SimpleNodeParser
import os
os.environ["OPENAI_API_KEY"] = "<Your key here>"

# load in some sample data
reader = SimpleDirectoryReader(input_files=["f1.pdf","f2.pdf","f3.pdf"])
documents = reader.load_data()

# parse nodes
node_parser = SimpleNodeParser.from_defaults()
nodes = node_parser.get_nodes_from_documents(documents)

If you check in the code above, you can see that I am using 3 pdf files, f1, f2 and f3. Let me just explain what they are about.

The three files

f1.pdf — Annual review 2022 for the company nestle (79 pages)
f2.pdf — 3 different reports for 2022 rolled into one pdf having 206 pages
f3. pdf — Sustainability report 2022 (63 pages)

Questions to be asked

For the 5 different strategies open-sourced by LlamaIndex, we are going to ask the following 2 questions

Q1: What was the Annual Bonus percentage paid out to the CEO?

Answer seems to be in page 46 of f2

Q2: What is the minimum number of members in the board of directors?

Answer is at page 63 of f2

Strategy 1: Hybrid Fusion Pack

What this does: Generates multiple queries from the same question and then aggregates the answer (vector + keyword search + reranking)[Source]

from llama_index.llama_pack import download_llama_pack

HybridFusionRetrieverPack = download_llama_pack(
    "HybridFusionRetrieverPack",
    "./hybrid_fusion_pack",
)


hybrid_fusion_pack = HybridFusionRetrieverPack(
    nodes, chunk_size=256, vector_similarity_top_k=2, bm25_similarity_top_k=2
)
questions=[
    "What was the Annual Bonus percentage paid out to the CEO?",
    "What is the minimum number of members in the board of directors?"
]
# this will run the full pack
for question in questions:
    response = hybrid_fusion_pack.run(question)
    print(question,"\n",str(response))
    print("*"*100)

Output for the questions

Q1: What was the Annual Bonus percentage paid out to the CEO?

A: The Annual Bonus percentage paid out to the CEO was 150%.

Q2: What is the minimum number of members in the board of directors?

A: The minimum number of members in the board of directors is not specified in the given context information.

Looks like it was able to answer 1 out of 2 questions.

Strategy 2: Query Rewriting Retriever Pack

What this does: Generates multiple queries from the same question and then aggregates the answer (Query Rewriting + Fusion)[Source]

# Download and Initialize Pack
from llama_index.llama_pack import download_llama_pack

QueryRewritingRetrieverPack = download_llama_pack(
    "QueryRewritingRetrieverPack",
    "./query_rewriting_pack",
)

query_rewriting_pack = QueryRewritingRetrieverPack(
    nodes,
    chunk_size=256,
    vector_similarity_top_k=2,
)
# this will run the full pack
for question in questions:
    response = query_rewriting_pack.run(question)
    print(question,"\n",str(response))
    print("*"*100)

Generated responses

Q1: What was the Annual Bonus percentage paid out to the CEO?

A: The Annual Bonus percentage paid out to the CEO was 102.5%.

Interesting! Here it says that the annual bonus percentage paid out to the CEO was 102.5%. However we have seen in the previous answer that the percentage was 150. Let us look at the source material once:

This is NOT the amount that was paid, its the target level applicable.

So we go back to the file and do a reverse search with the value 102.5%

Sure enough, here is the actual amount that was paid to the CEO.

This means that the Hybrid Fusion Pack gave a wrong answer.

Q2: What is the minimum number of members in the board of directors?

A: The minimum number of members in the board of directors is not specified in the given context information.

So this one got 1/2. Hybrid Fusion Pack got 0/2.

Strategy 3: Auto Merging Retriever Pack

It loads a document, builds a hierarchical node graph (with bigger parent nodes and smaller child nodes) and then gives the answer. [Source]

from llama_index.llama_pack import download_llama_pack

AutoMergingRetrieverPack = download_llama_pack(
    "AutoMergingRetrieverPack",
    "./auto_merging_retriever_pack",
)
auto_merging_pack = AutoMergingRetrieverPack(documents)
# this will run the full pack
for question in questions:
    response = auto_merging_pack.run(question)
    print(question,"\n",str(response))
    print("*"*100)

Q1: What was the Annual Bonus percentage paid out to the CEO?

A: The Annual Bonus percentage paid out to the CEO was 102.5% in 2022.

Q2: What is the minimum number of members in the board of directors?

A: The minimum number of members in the board of directors is seven.

Which is correct. This one got both correct 2/2.

Strategy 4: Sentence Window Retriever

It loads a document, chunks it up, adds surrounding context as metadata to each chunk, and during retrieval inserts the context back into each chunk for response synthesis. [Source]

SentenceWindowRetrieverPack = download_llama_pack(
    "SentenceWindowRetrieverPack",
    "./sentence_window_retriever_pack",
)
sentence_window_retriever_pack = SentenceWindowRetrieverPack(
    documents,
)
# this will run the full pack
for question in questions:
    response = sentence_window_retriever_pack.run(question)
    print(question,"\n",str(response))
    print("*"*100)

Q1: What was the Annual Bonus percentage paid out to the CEO?

A: The Annual Bonus percentage paid out to the CEO in 2022 was 102.5%.

Q2: What is the minimum number of members in the board of directors?

A: The context information does not provide the minimum number of members in the board of directors.

Score: 1/2

Strategy 5: Small-to-big Retrieval Pack

It loads a document, builds a hierarchical node graph (with bigger parent nodes and smaller child nodes) [Source].

RecursiveRetrieverSmallToBigPack = download_llama_pack(
    "RecursiveRetrieverSmallToBigPack",
    "./recursive_retriever_stb_pack",
)

recursive_retriever_stb_pack = RecursiveRetrieverSmallToBigPack(
    documents,
)
# this will run the full pack
for question in questions:
    response = recursive_retriever_stb_pack.run(question)
    print(question,"\n",str(response))
    print("*"*100)

Q1: What was the Annual Bonus percentage paid out to the CEO?

A: The Annual Bonus percentage paid out to the CEO was 102.5%.

Q2: What is the minimum number of members in the board of directors?

A: The minimum number of members in the board of directors is seven.

Which is correct. This one got both correct 2/2.

Here is the comparison

Strategy 1: Hybrid Fusion Retriever Pack — 0/2
Strategy 2: Query Rewriting Retriever Pack— 1/2
Strategy 3: Auto Merging Retriever Pack — 2/2
Strategy 4: Sentence Window Retriever — 0/2
Strategy 5: Small-to-big Retrieval Pack — 2/2

Looks like the Auto Merging Retriever Pack and Small-to-big Retrieval Pack provided by LlamaIndex perform the best for this experiment. In fact, I found both to be very similar.

The code is available as a google colab notebook. Give them a twirl and let me know if you find this useful.

Until next time.

Below are links to my other llm-based articles: