Zero-shot Labeling using LLM such as GPT is a promising approach to quickly create training data with minimal human input. It enables training AI systems without needing to manually label the entire dataset. However, one of the disadvtanage of this approach is accurate classification of complex and ambiguous entities.

Imagine a scenario where an AI system needs to label entities in news articles. While classifying straightforward topics like "sports" or "politics" might be a breeze, things get tricky when we encounter more intricate entities like "artificial intelligence regulations," "climate change agreements," or "financial market fluctuations." These labels often carry inherent ambiguity, and traditional auto-labeling systems may stumble when trying to disentangle the subtle nuances that differentiate one label from another.

This is where the concept of "Description guided zero-shot labeling" enters the scene. By providing concise and informative descriptions for each label, we equip our LLM with invaluable context and clarity. This approach holds the promise of significantly enhancing the accuracy of zero-shot auto-labeling by offering guidance and disambiguation precisely when it's needed most.

In this article, we explore the challenges posed by complex entities, and demonstrate how the inclusion of label descriptions can be a game-changer. We will examine the mechanics of label description guided auto-labeling, present real-world case studies and experiments, discuss its potential applications across industries, and explore the challenges that lie ahead on the road to achieving enhanced accuracy.

Setting up Zero-Shot Labeling

In this section, we will walk through the process of enabling description-guided auto-labeling using UbiAI, a powerful labeling platform designed to streamline the labeling process and model fine-tuning. We'll illustrate this tutorial with practical examples from litigation case analysis.

For this tutorial, we are going to identify plaintiff, defendants and their claims from litigation cases using zero shot labeling. First we upload the document to UbiAI, below is a small snippet of the document:

None
Litigation Document. Image by Author.

We will extract the Defendant and Plaintiff names as well as the claims for each party. To do so, we simply add the labels in UbiAI and enable the zero shot LLM feature:

None
UbiAI Labeling Interface. Image by Author.

Let's run the zero shot labeling without any description per label added:

None
UbiAI Zero-shot Labeling Configuration Window. Image By Author.

Here is the result:

None
LLM Zero-shot Labeling Without Description
None
LLM Zero-shot Labeling Without Description. Image by Author.

Although the plaintiff and defendant names were identified correctly, the claims of each party were not extracted.

Now, let's add descriptions for each label:

PLAINTIFF: Identify the name of the plaintiff. Do not extract sentences.

DEFENDANT: Identify the name of the defendant. Do not extract sentences.

CLAIM_PLAINTIFF: Identify the sentence describing the claim of the plaintiff

CLAIM_DEFENDANT: Identify the sentence describing the claim of the defendant.

Underhood, UbiAI is leveraging the new "function calling" feature of OpenAI to attach a description for each label.

It is important to add clear and concise descriptions for each label to guide the LLM effectively. We've also noticed that adding positive and negative examples in the description boosts the accuracy.

None
LLM Zero-Shot Configuration Window. Image by Author.

In UbiAI, you have the option to select GPT3 or GPT-4 model. To enable the description per label, we will need to switch to the 16k context length since it allows for larger input to enter our descriptions. Then click on the edit description button to enter the description

Description Guided Zero Shot-Labeling

We are now ready to run the LLM Zero-Shot Labeling with the added description. With the help of the provided description, the LLM is now able to correctly extract the plaintiff claims as shown below.

None
LLM Zero-shot Labeling With Description. Image by Author.

However, it incorrectly identified the Defendant Claim (CLAIM_DEFENDANT) which was not present in the document. Further clarification in the description should help.

None
LLM Zero-shot Labeling With Description. Image by Author.

Conclusion

In this tutorial, we showcased the practical application of description-guided auto-labeling using UbiAI. We delved into the setting up labels, defining clear descriptions, configuring the auto-labeling process, and conducting manual verification.

By providing crucial context and guidance, label descriptions helps the LLM to make more informed idenitfication and mitigate ambiguity, ultimately leading to more accurate classifications.

Guiding the LLM with concise and clear description is crucial to avoid false positives as shown in this tutorial. However, the potential benefits for industries, research, and applications are profound.