Modeling Document Networks With Tree-Averaged Copula Regularization
Yuan He,Cheng Wang,Changjun Jiang
DOI: https://doi.org/10.1145/3018661.3018666
2017-01-01
Abstract:Document network is a kind of intriguing dataset which provides both topical (texts) and topological (links) information. Most previous work assumes that documents closely linked with each other share common topics. However, the associations among documents are usually complex, which are not limited to the homophily (i.e., tendency to link to similar others). Actually, the heterophily (i.e., tendency to link to different others) is another pervasive phenomenon in social networks. In this paper, we introduce, a new tool, called copula, to separately model the documents and links, so that different copula functions can be applied to capture different correlation patterns. In statistics, a copula is a powerful framework for explicitly modeling the dependence of random variables by separating the marginals and their correlations. Though widely used in Economics, copulas have not been paid enough attention to by researchers in machine learning field. Besides, to further capture the potential associations amoug the unconnected documents, we apply the tree-averaged copula instead of a single copula function. This improvement makes our model achieve better expressive power, and also more elegant in algebra. We derive efficient EM algorithms to estimate the model parameters, and evaluate the performance of our model on three different datasets. Experimental results show that our approach achieves significant improvements on both topic and link modeling compared with the current state of the art.