|ARTIFICIAL INTELLIGENCE|LLMs|FINE-TUNING|

"Knowledge has to be improved, challenged, and increased constantly, or it vanishes." — Peter Drucker "Learning without thinking is useless. Thinking without learning is dangerous." ― Confucius

Large Language Models (LLMs) are trained on huge text corpora where they acquire a large amount of factual knowledge. This knowledge is embedded in their parameters and can then be used when needed. The knowledge of these models is "crystallized" at the end of the training. At the end of pretraining, the model in fact stops learning.

Thereafter, the model can be aligned or undergo instruction tuning to learn how to make the best use of this knowledge and how to respond more naturally to a user's questions. Sometimes model knowledge is not enough, though, because this question is generalist and not tailored to the domain of interest. Although the model can access external memory through RAG, adapting the model to a new domain through fine-tuning is considered beneficial. Typically, this fine-tuning is conducted using inputs created by human annotators or other LLMs. During this phase, the model encounters additional factual knowledge and integrates it into its parameters.

How does the model integrate this new additional knowledge?

In fact, at the mechanistic level we do not really know how this interaction takes place. Moreover, according to some, exposure to this new knowledge may lead the model to hallucinate. This is because the model is trained to generate facts that are not, however, grounded in its pre-existing knowledge (or may conflict with the model's prior knowledge). Moreover, as seen earlier models can struggle with rare knowledge (e.g., entities that are less frequent in the pretraining corpus).

How does fine-tuning affect a LLM?
image source: here

Therefore, a recently published study was concerned with analyzing what happens when a model is presented with new knowledge by fine-tuning.

The authors investigated in detail what happens with a model that goes through fine-tuning and what happens to its responses after it acquires new knowledge.

To begin with, they tried to classify at the knowledge level for an example after fine-tuning. A new example inherently has knowledge that may not be consistent with that of the model. An example can be known or unknown. Even if known it might appear to hold to highly known, maybe known, or weakly known knowledge.

How does fine-tuning affect a LLM?
image source: here

The authors then took a model (PaLM 2-M) and fine-tuned it with this dataset. Each example for fine-tuning is structured as factual knowledge (subject, relation, object). This is to allow the model to query this knowledge with specific questions specific triplet (e.g., "Where is Paris located?") and the ground-truth answer (e.g., "France"). In other words, they provide the model with some new knowledge and then restructure these triplets into questions (question-answering pairs) to test its knowledge. They divided all these examples into the categories discussed above and then evaluated the answer.

The authors conducted fine-tuning of the model and then decided to test for hallucinations. For them, a high percentage of unknown facts leads to performance degradation (and this is not compensated by a longer fine-tuning time).

How does fine-tuning affect a LLM?
image source: here

In fact, training for multiple epochs seems to have a deleterious effect. In fact, previous studies showed that more epochs led to performance degradation (maybe leading to overfitting). For the authors, this effect is increased with a high unknown fact percentage.

For the authors, unknown facts have an almost neutral effect at low numbers of epochs but harm performance with more epochs. It thus appears that Unknown examples are harmful, but their negative effect is mostly materialized in later training stages. The authors then study the dynamics of fitting the examples. The figure presents the training accuracy of the Known and Unknown subsets of dataset examples as a function of the fine-tuning duration. It is seen that the model learns the unknown examples at a late stage.

Lastly, since Unknown examples are the ones that are likely to introduce new factual knowledge, their significantly slow fitting rate suggests that LLMs struggle to acquire new factual knowledge through fine-tuning, instead they learn to expose their preexisting knowledge using the Known examples. (source)

How does fine-tuning affect a LLM?
image source: here

The authors question whether this relationship between accuracy, and known and unknown examples can be quantified and whether it is linear. The result would show that there is a strong linear relationship that unknown examples hurt performance, while known ones improve it, almost as strongly (the associated coefficients in this linear regression are moto close).

How does fine-tuning affect a LLM?
image source: here

Moreover, this fine-tuning has effects beyond performance in a specific case but has a broad effect on model knowledge. The authors using out-of-distribution (OOD) test sets show that the Unknown examples are harmful for OOD performance. According to the authors, there is also a relationship with the occurrence of hallucinations:

Overall, our insights transfer across relations. This essentially shows that fine-tuning on Unknown examples such as "Where is [E1] located?", can encourage hallucinations on seemingly unrelated questions, such as "Who founded [E2]?". (source)

An interesting result is that the best results are not obtained with highly known examples but with maybe known examples. In other words, these examples allow the model to make better use of its prior knowledge (facts that are too well-known do not have a useful effect on the model).

How does fine-tuning affect a LLM?
image source: here

In contrast, unknown and weakly known facts hurt the model's performance, and this decline is derived from an increase in hallucinations.

This work highlights the risk in using supervised fine-tuning to update LLMs' knowledge, as we present empirical evidence that acquiring new knowledge through finetuning is correlated with hallucinations w.r.t preexisting knowledge. (source)

So according to the authors, this unknown knowledge can harm performance (which makes fine-tuning almost useless). According to preliminary results flagging this unknown knowledge with "I don't know" can help reduce this damage.

How does fine-tuning affect a LLM?
image source: here

Acquiring new knowledge via supervised fine-tuning is correlated with hallucinations w.r.t. pre-existing knowledge. LLMs struggle to integrate new knowledge through fine-tuning and mostly learn to use their pre-existing knowledge. (source)

In conclusion for the authors, the model is damaged if it is presented with unknown knowledge during fine-tuning. Moreover, this performance degradation is correlated with an increase in hallucinations. In contrast, examples that are maybe known instead have a beneficial effect. Thus, this shows that the model struggles to integrate new knowledge. In other words, there is a conflict between what the model has learned and how it uses the new knowledge. This could be related to alignment and instruction tuning (this unfortunately was not studied by this study).

On the one hand, if one wants to use models with some domain-specific knowledge, this study suggests that it is better to use RAG. On the other hand, the results with the flagging "I don't know" means that other strategies can be found to overcome these limitations of fine-tuning.

The fact that the maybe-known facts are indeed beneficial is a reassurance. Today's models are trained with huge amounts of text. Which means they have seen an enormous amount of knowledge. Therefore many of the facts in the fine-tuning dataset are already maybe-known elements from the model and do not harm the model (perhaps only a small part will be unknown).

In any case, this study is interesting and shows that there are still unclear elements of fine-tuning and how conflicts between new and old knowledge are resolved. Another reason to always test the results of a pre and after-fine-tuning model.

What do you think? Have you experienced an increase in post-fine-tuning hallucinations?

If you have found this interesting:

You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn. You can also subscribe for free to get notified when I publish a new story.

Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.

or you may be interested in one of my recent articles:

Reference

Here is the list of the principal references I consulted to write this article, only the first name of an article is cited.

  1. Huang, 2023, A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions, link
  2. Leogao, 2021, Behavior Cloning is Miscalibrated, link
  3. Gekhman, 2024, Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?, link