Topic Modeling for Short Texts Via Dual View Collaborate Optimization
Wei Liu,Yutao Huang,Yibing Guo,Ye Wang,Binxing Fang,Qing Liao
DOI: https://doi.org/10.1109/dsc55868.2022.00028
2022-01-01
Abstract:Short text topic modeling attracts many researchers’ attention with the emergence of online social media platforms, such as news websites, Twitter and Facebook. Existing topic models for short texts mainly focus on relieving the sparse problem to enhance the accuracy performance of topic modeling. However, most previous topic modeling approaches introduce external corpus word embeddings to enrich the global semantic information in the topic modeling process, ignoring the local association information of the target corpus. And the global semantic information provided by word embedding may not be entirely suitable for the target corpus. In most cases, the noise will be introduced to interfere with the reasoning of the topic. This paper proposes a novel topic model for short text called the Dual View Biterm Topic Model (DV-BTM). Specifically, DV-BTM constructs two views while learning local information from the target corpus and global information to auxiliarily infer about the topic. The semantic similarity view provides global information obtained by introducing pre-trained word embeddings on an external corpus. The Wordnet view is constructed based on the target corpus itself, mainly providing local information about the corpus. Finally, through the collaborative optimization of the dual views, the consistency of the extracted topics is improved. The DV-BTM experiments on two real-world short text datasets demonstrate that DV-BTM has the best performance among the comparison methods in topic coherence and text classification aspects.