Semantic Augmented Topic Model over Short Text

Lingyun Li,Yawei Sun,Cong Wang
DOI: https://doi.org/10.1109/ccis.2018.8691313
2018-01-01
Abstract:With the rapid development of Internet and mobile devices, a vast number of short texts are produced by users, which also post great challenges to topic modeling because of the severe sparsity in context. The traditional topic model cannot do well in short text because of lacking word co-occurrence patterns. An effective approach bi-term topic model(BTM) has been proposed which models the word co-occurrence at the whole corpus directly and performs better than conventional topic models. However, BTM only consider the frequency of bi-term simply and ignore the latent semantic information between bi-terms which cause the words with similar semantic having a great risk of being grouped under different topic. In this paper, we propose a latent semantic augmented bi-term topic model(LS-BTM) which incorporates semantic information as prior knowledge to infer the topic more reasonable. The experimental result shows that our model gets better result than other short text topic models over real-world dataset.
What problem does this paper attempt to address?