Semi-Supervised Text Classification Via Self-Paced Semantic-Level Contrast

Yu Xia,Kai Zhang,Kaijie Zhou,Rui Wang,Xiaohui Hui
DOI: https://doi.org/10.1007/978-3-031-33377-4_37
2023-01-01
Abstract:Semi-Supervised Text Classification (SSTC) aims to explore discriminative information from unlabeled texts in a self-training manner. These methods pre-train the deep classifier on labeled texts. Recent works further fine-tune the model on the combination of labeled texts and pseudo-labeled texts generated by the pre-trained deep classifier. However, the model's performance largely depends on the quality of pseudo-labels. To tackle such an issue, we propose a novel approach, namely Self-paced Semantic-level Contrastive Learning ((SCL)-C-2) for SSTC. (SCL)-C-2 imposes a self-paced pseudo-label generator to improve the quality of pseudo-labels. We innovatively propose robust supervised learning and semantic-level contrastive learning modules to alleviate the model's over-sensitivity to pseudo-labels' quality. Empirically, (SCL)-C-2 significantly outperforms the state-of-the-art methods on benchmark datasets with 0.3% - 4.6% improvements on Micro-F1 and 0.3% - 11.1% improvements on Macro-F1. Furthermore, we establish a practical dataset, i.e., Events39, to provide a benchmark for evaluating the robustness against domain-shift of SSTC methods. The experimental results demonstrate the effectiveness of (SCL)-C-2 on Events39.
What problem does this paper attempt to address?