Incorporating the Latent Link Categories in Relational Topic Modeling.

Yuan He,Cheng Wang,Changjun Jiang
DOI: https://doi.org/10.1145/3132847.3132881
2017-01-01
Abstract:The soaring of social media services has greatly propelled the prevalence of document networks. Rather than a set of plain texts, documents are nodes in graphs. An observable link connects the documents at its two ends, thus it implicitly reflects the semantic association between the document pair. Previous work assumes that only similar documents tend to be connected, which neglects the rich connective patterns in the topological structure. In this paper, we introduce a latent correlation factor to categorize the links into several categories, and each category corresponds to a unique kind of association. By fitting the data, the relational information (e.g., homophily and heterophily) can be comprehensively captured. By resorting to Canonical Correlation Analysis (CCA), we maximize the correlation between all pairs of linked documents. We propose a pure generative model and derive efficient learning algorithms based on the variational EM methods. Experiments on three different datasets demonstrate that the proposed model is competitive and usually better than the state-of-the-art baselines on both topic modeling and link prediction.
What problem does this paper attempt to address?