Unbiased Characterization of Node Pairs over Large Graphs

Pinghui Wang,Junzhou Zhao,John C. S. Lui,Don Towsley,Xiaohong Guan
DOI: https://doi.org/10.1145/2700393
IF: 4.157
2015-01-01
ACM Transactions on Knowledge Discovery from Data
Abstract:Characterizing user pair relationships is important for applications such as friend recommendation and interest targeting in online social networks (OSNs). Due to the large-scale nature of such networks, it is infeasible to enumerate all user pairs and thus sampling is used. In this article, we show that it is a great challenge for OSN service providers to characterize user pair relationships, even when they possess the complete graph topology. The reason is that when sampling techniques (i.e., uniform vertex sampling (UVS) and random walk (RW)) are naively applied, they can introduce large biases, particularly for estimating similarity distribution of user pairs with constraints like existence of mutual neighbors, which is important for applications such as identifying network homophily. Estimating statistics of user pairs is more challenging in the absence of the complete topology information, as an unbiased sampling technique like UVS is usually not allowed and exploring the OSN graph topology is expensive. To address these challenges, we present unbiased sampling methods to characterize user pair properties based on UVS and RW techniques. We carry out an evaluation of our methods to show their accuracy and efficiency. Finally, we apply our methods to three OSNs—Foursquare, Douban, and Xiami—and discover that significant homophily is present in these networks.
What problem does this paper attempt to address?