RankTopic: Ranking Based Topic Modeling

Dongsheng Duan,Yuhua Li,Ruixuan Li,Rui Zhang,Aiming Wen
DOI: https://doi.org/10.1109/ICDM.2012.12
2012-01-01
Abstract:Topic modeling has become a widely used tool for document management due to its superior performance. However, there are few topic models distinguishing the importance of documents on different topics. In this paper, we investigate how to utilize the importance of documents to improve topic modeling and propose to incorporate link based ranking into topic modeling. Specifically, topical pagerank is used to compute the topic level ranking of documents, which indicates the importance of documents on different topics. By retreating the topical ranking of a document as the probability of the document involved in corresponding topic, a generalized relation is built between ranking and topic modeling. Based on the relation, a ranking based topic model Rank Topic is proposed. With Rank Topic, a mutual enhancement framework is established between ranking and topic modeling. Extensive experiments on paper citation data and Twitter data are conducted to compare the performance of Rank Topic with that of some state-of-the-art topic models. Experimental results show that Rank Topic performs much better than some baseline models and is comparable with the state-of-the-art link combined relational topic model (RTM) in generalization performance, document clustering and classification by setting a proper balancing parameter. It is also demonstrated in both quantitative and qualitative ways that topics detected by Rank Topic are more interpretable than those detected by some baseline models and still competitive with RTM.
What problem does this paper attempt to address?