Generation-driven Contrastive Self-training for Zero-shot Text Classification with Instruction-following LLM

Ruohong Zhang,Yau-Shian Wang,Yiming Yang
2024-04-15
Abstract:The remarkable performance of large language models (LLMs) in zero-shot language understanding has garnered significant attention. However, employing LLMs for large-scale inference or domain-specific fine-tuning requires immense computational resources due to their substantial model size. To overcome these limitations, we introduce a novel method, namely GenCo, which leverages the strong generative power of LLMs to assist in training a smaller and more adaptable language model. In our method, an LLM plays an important role in the self-training loop of a smaller model in two important ways. Firstly, the LLM is used to augment each input instance with a variety of possible continuations, enriching its semantic context for better understanding. Secondly, it helps crafting additional high-quality training pairs, by rewriting input texts conditioned on predicted labels. This ensures the generated texts are highly relevant to the predicted labels, alleviating the prediction error during pseudo-labeling, while reducing the dependency on large volumes of unlabeled text. In our experiments, GenCo outperforms previous state-of-the-art methods when only limited ($<5\%$ of original) in-domain text data is available. Notably, our approach surpasses the performance of Alpaca-7B with human prompts, highlighting the potential of leveraging LLM for self-training.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
This paper attempts to solve several key problems in zero - shot text classification: 1. **Computational resource consumption**: Although large - language models (LLMs) perform well in zero - sample understanding tasks, their large model sizes make large - scale inferences or domain - specific fine - tuning require huge computational resources. 2. **Pseudo - label errors**: Traditional self - training methods rely on a large amount of unlabeled data to generate pseudo - labels. However, these pseudo - labels may have errors, which will affect the model training effect. 3. **Data dependence**: Many self - training methods require a large amount of unlabeled text, which may be difficult to obtain in practical applications. ### Main contributions of the paper To solve the above problems, the author proposes a new method - Generation - driven Contrastive Self - training (GenCo). The specific contributions are as follows: 1. **Using LLM to enhance small models**: By introducing LLM to assist in training smaller language models, it can perform efficient self - training with limited computational resources. This method not only improves the classification performance but also reduces the dependence on large - scale unlabeled data. 2. **Handling zero - sample classification in extreme cases**: In the case of only a small amount of unlabeled text, this method can still be significantly better than the existing baseline methods. 3. **Theoretical support**: It provides theoretical proof to support the effectiveness of the proposed contrastive loss function and ensures the classification generalization ability in the self - training process. ### Method overview The GenCo framework mainly includes the following steps: - **Semantic enhancement**: Use LLM to generate multiple extended versions of the input text to enrich the input information and improve the quality of pseudo - label prediction. - **Conditional generation of training pairs**: Generate high - quality training pairs according to the pseudo - labels to reduce the negative impact of pseudo - label errors. - **Contrastive self - training objective**: Design a contrastive loss function that combines soft - labels and entropy regularization to prevent the model from overfitting and improve the generalization ability of classification. Through these innovations, the experimental results of GenCo on multiple benchmark datasets show that it can effectively solve the challenges in zero - shot text classification, especially in the case of limited computational resources and insufficient unlabeled data.