Semantic Enhancement and Multi-level Label Embedding for Chinese News Headline Classification

Jiangnan Qi,Yuan Rao,Ling Sun,Xiong Yang
DOI: https://doi.org/10.1109/iSAI-NLP48611.2019.9045404
2019-01-01
Abstract:News headline classification is a specific example of short text classification, which aims to extract semantic information from the short text and classify it accurately. It can provide a fast classification method for data of various kinds of news media, thus arousing the common concern of academia and industry. Most short text classification methods are based on the semantic expansion of external knowledge, which is unable to expansion dynamically in real time and make full use of label information. To overcome these problems, we propose a novel method which consists of three parts: semantic enhancement, multi-dimensional feature fusion network and multi-level label embedding. Firstly, the word-level semantic information are embedded into the character encoding from pre-train model to enhance semantic features. Secondly, both of Bi-GRU and multi-scale CNN are used to extract sequence and local features of text to enhance the semantic representation of the sentence. Furthermore, the multi-level label embedding is used to filter textual vector and assist classification in the word and sentence level respectively. Experimental results on NLPCC 2017 Chinese news headline classification task show that our model achieves 84.74% of accuracy and 84.75% of F1, improves over the best baseline model by 1.5% and 1.6%, respectively, and reaches the state-of-the-art performance.
What problem does this paper attempt to address?