A Latent Topic Model for Linked Documents

Zhen Guo,Shenghuo Zhu,Yun Chi,Zhongfei (Mark) Zhang,Yihong Gong
DOI: https://doi.org/10.1145/1571941.1572095
2009-01-01
Abstract:Documents in many corpora, such as digital libraries and webpages, contain both content and link information. To explicitly consider the document relations represented by links, in this paper we propose a citation-topic (CT) model which assumes a probabilistic generative process for corpora. In the CT model a given document is modeled as a mixture of a set of topic distributions, each of which is borrowed (cited) from a document that is related to the given document. Moreover, the CT model contains a random process for selecting the related documents according to the structure of the generative model determined by links and therefore, the transitivity of the relations among documents is captured. We apply the CT model on the document clustering task and the experimental comparisons against several state-of-the-art approaches demonstrate very promising performances.
What problem does this paper attempt to address?