Compositional Recurrent Neural Networks for Chinese Short Text Classification

Yujun Zhou,Bo Xu,Jiaming Xu,Lei Yang,Changliang Li,Bo Xu
DOI: https://doi.org/10.1109/wi.2016.0029
2016-10-01
Abstract:Word segmentation is the first step in Chinese natural language processing, and the error caused by word segmentation can be transmitted to the whole system. In order to reduce the impact of word segmentation and improve the overall performance of Chinese short text classification system, we propose a hybrid model of character-level and word-level features based on recurrent neural network (RNN) with long short-term memory (LSTM). By integrating character-level feature into word-level feature, the missing semantic information by the error of word segmentation will be constructed, meanwhile the wrong semantic relevance will be reduced. The final feature representation is that it suppressed the error of word segmentation in the case of maintaining most of the semantic features of the sentence. The whole model is finally trained end-to-end with supervised Chinese short text classification task. Results demonstrate that the proposed model in this paper is able to represent Chinese short text effectively, and the performances of 32-class and 5-class categorization outperform some remarkable methods.
What problem does this paper attempt to address?