Large Model-Based Data Augmentation for Imbalanced Text Classification

Manman Zhang,Peiyao Zhou,Rongxin Mi,Tianhang Song,Dawei Zhang,Dawei Jin
DOI: https://doi.org/10.1109/AINIT61980.2024.10581735
2024-03-29
Abstract:This study focuses on the application of large models to deal with imbalanced data problems in text classification. In view of the central position of text in web data and the negative impact of class imbalance on classifier performance, researchers have explored the method of using large models to generate high-quality minority class samples to enhance model performance. This paper reviews the technical progress of machine learning, deep learning, and large language models and their applications in text classification tasks. Although large models perform well in complex tasks due to their excellent language understanding ability, traditional machine learning and deep learning methods are popular in text classification scenarios that require fast response due to their simple structure and higher computational efficiency. This study proposes a data augmentation technique inspired by SMOTE, which uses a large language model combined with a simple prompt engineering strategy to generate high-quality minority samples. The experimental results show that the proposed method significantly improves the macro average precision, recall and F1 score on multiple text classification models, and effectively alleviates the challenge of class imbalance.
Computer Science
What problem does this paper attempt to address?