A Semantic Representation Enhancement Method for Chinese News Headline Classification.

Zhongbo Yin,Jintao Tang,Chengsen Ru,Wei Luo,Zhunchen Luo,Xiaolei Ma
DOI: https://doi.org/10.1007/978-3-319-73618-1_27
2017-01-01
Abstract:Recently there has been an increasing research interest in short text such as news headline. Due to the inherent sparsity of short text, the current text classification methods perform badly when applied to the classification of news headlines. To overcome this problem, a novel method which enhances the semantic representation of headlines is proposed in this paper. Firstly, we add some keywords extracted from the most similar news to expand the word features. Secondly, we use the corpus in news domain to pre-train the word embedding so as to enhance the word representation. Moreover, Fasttext classifier, which uses a liner method to classify text with fast speed and high accuracy, is adopted for news headline classification. On the task for Chinese news headline categorization in NLPCC2017, the proposed method achieved 83.1% of the F-measure, which got the first rank in 33 teams.
What problem does this paper attempt to address?