Biterm Pseudo Document Topic Model for Short Text

Lan Jiang,Hengyang Lu,Ming Xu,Chongjun Wang
DOI: https://doi.org/10.1109/ictai.2016.0134
2016-01-01
Abstract:In the past few years, we have witnessed a rapid development of online social media, from which we can access various short texts. Understanding the topic patterns of these short text is significant. Traditional topic models, like LDA, are not suitable when applied to short text topic analysis due to data sparsity. A lot of efforts have been made to solve this problem. However, there is still significant space to improve the effectiveness of these short text specific methods. In this paper, we proposed a novel word co-occurrence network based method, referred to as biterm pseudo document topic model (BPDTM), which extended the previous biterm topic model(BTM) for short text. We utilized the word co-occurrence network to construct biterm pseudo documents. The proposed model is promising since it represents words with their semantic adjacent biterms and is able to model the corpus-level semantic relation between two words. Besides, BPDTM naturally lengthens the documents, which alleviate the influence for performance exerted by data sparsity. Experiments demonstrated that our model outperformed two baselines, i.e. LDA and BTM, which proved its effectiveness on short text topic modeling task.
What problem does this paper attempt to address?