Label Distribution Learning-Enhanced Dual-KNN for Text Classification

Bo Yuan,Yulin Chen,Zhen Tan,Wang Jinyan,Huan Liu,Yin Zhang
DOI: https://doi.org/10.1137/1.9781611978032.47
2024-01-01
Abstract:Many text classification methods usually introduce external information (e.g., label descriptions and knowledge bases) to improve the classification performance. Compared to external information, some internal information generated by the model itself during training, like text embeddings and predicted label probability distributions, are exploited poorly when predicting the outcomes of some texts. In this paper, we focus on leveraging this internal information, proposing a dual k nearest neighbor (DkNN) framework with two kNN modules, to retrieve several neighbors from the training set and augment the distribution of labels. For the kNN module, it is easily confused and may cause incorrect predictions when retrieving some nearest neighbors from noisy datasets (datasets with labeling errors) or similar datasets (datasets with similar labels). To address this issue, we also introduce a label distribution learning module that can learn label similarity, and generate a better label distribution to help models distinguish texts more effectively. This module eases model overfitting and improves final classification performance, hence enhancing the quality of the retrieved neighbors by kNN modules during inference. Extensive experiments on the benchmark datasets verify the effectiveness of our method.
What problem does this paper attempt to address?