A News Headlines Classification Method Based on the Fusion of Related Words.

Yongguan Wang,Binjie Meng,Pengyuan Liu,Erhong Yang
DOI: https://doi.org/10.1007/978-3-319-73618-1_71
2017-01-01
Abstract:Short text classification is a challenging work as a result of several words, usually fewer than 20 words, in each text which brings about a problem of feature sparsity. In this paper, we propose a method of extending short text to cope with the problem of data sparsity. Additionally, we combine extension of short text, which forms a new representation with the word vector of each word in the short text trained by word2vec model on large-scale corpus. Furthermore, the new representation works as input for neural bag-of-words (NBOW) model. We evaluate this method on NLPCC 2017 Evaluation Task 2. The experimental results show that extension of short text extension with NBOW model outperforms baselines and can achieve excellent performance on the news headline classification task.
What problem does this paper attempt to address?