Improving Text Classification Via Computing Category Correlation Matrix from Text Graph

Zhen Zhang,Mengqiu Liu,Xiyuan Jia,Gongxun Miao,Xin Wang,Hao Ni,Guohua Wu
DOI: https://doi.org/10.1016/j.csl.2024.101688
IF: 3.252
2024-01-01
Computer Speech & Language
Abstract:In text classification task, models have shown remarkable accuracy across various datasets. However, confusion often arises when certain categories within the dataset are too similar, causing misclassification of certain samples. This paper proposes an improved method for this problem, through the creation of a three-layer text graph for the corpus, which is used to calculate the Category Correlation Matrix (CCM). Additionally, this paper introduces category-adaptive contrastive learning for text embedding from the encoder, enhancing the model’s ability to distinguish between samples in confusable categories that are easily confused. Soft labels are generated using this matrix to guide the classifier, preventing the model from becoming overconfident with one-hot vectors. The efficacy of this approach was demonstrated through experimental evaluations on three text encoders and six different datasets.
What problem does this paper attempt to address?