A LDA Model Based Topic Detection Method

Lantian Guo,Yang Li,Dejun Mu,Tao Yang,Zhe Li
DOI: https://doi.org/10.3969/j.issn.1000-2758.2016.04.022
2016-01-01
Abstract:Topic Detection is one of the most important techniques in hot topic extraction and evolution tracking. Due to the high dimensionality problem which hinders processing efficiency and topics mal?distribution problem which makes topics unclear, it is difficult to detect topics from a large number of short texts in social network. To address these challenges, we proposed a new LDA ( Latent Dirichlet Allocation) model based topic detection meth?od called CBOW?LDA topic modeling method. It utilizes a CBOW( Continuous Bag?of?Word) method to cluster the words, which generate word vectors and clustering by vectors similarity. This method decreases the dimensions of LDA output, and makes topic more clearly. Through the analysis of topic perplexity in the real?world dataset, it is obvious that topics detected by our method has a lower perplexity, comparing with word frequency weighing based vectors. In a condition of same number of topic words, perplexity is reduced by about 3%.
What problem does this paper attempt to address?