Semi-supervised text classification algorithm with data augmentation and similar pseudo-labels

Sheng Xiaohui,Shen Hailong
DOI: https://doi.org/10.19734/j.issn.1001-3695.2022.08.0412
2023-01-01
Abstract:In order to reduce the dependence on labeled data and make full use of a large number of unlabeled data, this paper proposed the STAP(semi-supervised text classification algorithm with data augmentation and similar pseudo-labels). The algorithm used EPiDA(easy plug-in data augmentation) framework and self-training to expand a small amount of labeled data. It used consistency training and similar pseudo-labels to consider the relationship between unlabeled data and its expanded samples and the relationship between similar unlabeled data with high confidence. Under the constraint of supervised cross entropy loss, unsupervised consistency loss and unsupervised pair loss, it improved the quality of unlabeled data. Experiments on four text classification datasets show that STAP algorithm has obvious improvement over other classical text classification algorithms.
What problem does this paper attempt to address?