Extremely Short Chinese Text Classification Method Based on Bidirectional Semantic Extension

Yongzeng Yue,Yuhong Zhang,Xuegang Hu,Peipei Li
DOI: https://doi.org/10.1088/1742-6596/1437/1/012026
2020-01-01
Journal of Physics: Conference Series
Abstract:Abstract Short text classification methods have achieved significant progress and wide application on text data such as Twitter and Weibo. However, the extremely short chinese texts like tax invoice data are different with traditional short texts in lackness of contextual semantic information, feature sparseness and extremely short length. The existing short text classification methods are difficult to achieve a satisfactory performance in these texts. To address these problems, this paper proposes a text classification method based on bidirectional semantic extension for extremely short texts like Chinese tax invoice data. More specifically, firstly, the Chinese knowledge graph is introduced for extending bidirectional semantic of texts and label data to expand the extremely short texts and ease the problem of feature sparseness; secondly, the hash vectorization is used to avoid the semantic problem caused by the lackness of contextual information. Experimental results conducted the real tax invoice dataset demonstrate the effectiveness of our proposed method.
English Else
What problem does this paper attempt to address?