Large Language Models Are Zero-Shot Text Classifiers

Zhiqiang Wang,Yiran Pang,Yanbin Lin
2023-12-02
Abstract:Retrained large language models (LLMs) have become extensively used across various sub-disciplines of natural language processing (NLP). In NLP, text classification problems have garnered considerable focus, but still faced with some limitations related to expensive computational cost, time consumption, and robust performance to unseen classes. With the proposal of chain of thought prompting (CoT), LLMs can be implemented using zero-shot learning (ZSL) with the step by step reasoning prompts, instead of conventional question and answer formats. The zero-shot LLMs in the text classification problems can alleviate these limitations by directly utilizing pretrained models to predict both seen and unseen classes. Our research primarily validates the capability of GPT models in text classification. We focus on effectively utilizing prompt strategies to various text classification scenarios. Besides, we compare the performance of zero shot LLMs with other state of the art text classification methods, including traditional machine learning methods, deep learning methods, and ZSL methods. Experimental results demonstrate that the performance of LLMs underscores their effectiveness as zero-shot text classifiers in three of the four datasets analyzed. The proficiency is especially advantageous for small businesses or teams that may not have extensive knowledge in text classification.
Computation and Language
What problem does this paper attempt to address?
The paper primarily aims to address several key issues in text classification, particularly in the application of Zero-Shot Learning (ZSL). Specifically: 1. **Reducing computational cost and time consumption**: Traditional machine learning and deep learning methods require a large amount of labeled data for training when handling text classification tasks, which is not only time-consuming but also computationally expensive. The paper proposes using pre-trained large language models (LLMs), such as the GPT model, to directly perform classification predictions without the need for additional labeled data. 2. **Improving robustness to unseen categories**: Traditional supervised learning methods can only classify known categories and cannot handle new, unseen categories. By introducing zero-shot learning techniques, pre-trained models can be used to predict both known and unknown categories, thereby enhancing the model's generalization ability. 3. **Validating the effectiveness of the GPT model**: The research focuses on validating the effectiveness of the GPT model in different text classification scenarios and achieving zero-shot classification through the design of specific prompt strategies. Additionally, the paper compares the performance of zero-shot LLMs with traditional machine learning methods, deep learning methods, and other zero-shot methods across multiple datasets. Experimental results show that in the four analyzed datasets, the GPT model demonstrates better zero-shot classification performance on three datasets. This is particularly valuable for small businesses and teams, as these models simplify the text classification process, avoiding complex feature extraction and classifier training steps, and thus have high practical value.