A Chinese Character-Level and Word-Level Complementary Text Classification Method.

Wentong Chen,Chunxiao Fan,Yuexin Wu,Zhixiong Lou
DOI: https://doi.org/10.1109/taai51410.2020.00042
2020-01-01
Abstract:Text classification is a basic but important task in many natural language processing tasks. Nowadays, the mainstream classification methods mostly use deep learning technology, which shows better accuracy and stability in English text classification. Different from English text, Chinese text classification task involves the granularity of feature description in text decomposition. The two commonly used feature granularity are word-level feature and character-level feature. The former will bring semantic loss in the process of word segmentation, while the latter can't use the advanced semantic feature in the pre-trained word vector. We propose a method to fuse the word-level and the character-level information with attention mechanism. We train the CWC-Net, which combines the features to make the embedded information of characters and words complementary, so as to improve the semantic understanding ability of the network for Chinese text and reduce semantic loss. The comparative experiments on four Chinese text datasets, which involving topic classification and emotion analysis show that our model is more accurate than the traditional model which only relies on word-level features or character-level features. That verifies the effectiveness of the fusion of word-level features and character-level features on the improvement of model capability.
What problem does this paper attempt to address?