Enhancing Robust Text Classification via Category Description

Xin Gao,Zhengye Zhu,Xu Chu,Yasha Wang,Wenjie Ruan,Junfeng Zhao
DOI: https://doi.org/10.1109/ICDM54844.2022.00025
2022-01-01
Abstract:Despite the success of deep neural networks on text classification, their large capacity also leads to capturing taskirrelevant patterns such as label noise. Label noise is usually introduced into the data during label collection and causes nontrivial declines in performance due to the memorization effect. Though effort has been devoted to combating the label noise in other systems such as image classification, high-quality input features are necessary for discovering task-relevant patterns before memorizing the label noise. However, such a highquality input feature requirement is hard to be satisfied for text classification due to the nature of natural language. To combat the label noise with low-quality input features in the text classification, we propose a novel framework that exploits external category descriptions to construct prototypes that can be used to denoise the input representation and alleviate the overfitting. However, there still remains a challenge that the external category descriptions from other corpora could be semantically discrepant with the underlying task-specific classes in the training corpus. To align their semantics, we propose two regularizers that penalize sample-wise semantic-based deviations at the local level and class-wise structure-based deviations at the global level, respectively. Our extensive experiments across two open datasets and one real-world case study demonstrate that our method is superior to state-of-the-art baselines under various settings of label noise.
What problem does this paper attempt to address?