Avgsim: relevance measurement on massive data in heterogeneous networks

DING XIAO,XIAOFENG MENG,LI YITONG,CHUAN SHI,WU BIN
2016-01-01
Abstract:Heterogeneous information network includes multiple types of objects and multiple types of links. Compared with Homogeneous information network which only contains objects of the same type, heterogeneous information network has more abundant semantic information. Heterogeneous information network is very common in our daily life, such as social networks. Similarity search in heterogeneous information network can mine more precise and accurate knowledge. However, real social networks such as Sina Microblog and Facebook have a huge amount of data, which significantly increases the difficulty of similarity search. Unfortunately, many existing methods can only measure similarities between objects of the same type, moreover, the limitation of computing memory size results in quite limited measurable data amount, thus they can't be actually applied to real relation networks. In this paper, we propose a novel measure, called AvgSim, which can measure similarity between objects at the ends of any searching path in heterogeneous information networks. In addition, we apply parallel computing method in the realization of AvgSim in order to enable the handle of massive data and the application in real networks. Experiments on real datasets verify the effectiveness and efficiency of this novel algorithm.
What problem does this paper attempt to address?