HighSim : Highly Effective Similarity Measurement in Large Heterogeneous Information Networks

J. Yu,Philip S. Yu,C. Faloutsos,Wei Wang,J. Pei,G. Weikum,H. Garcia-Molina,R. Ramakrishnan,Yu,J. Naughton,H. Kriegel,Clement T. Yu,H. Jagadish
2016-01-01
Abstract:Heterogeneous information networks consist of rich information with many typed-links and typed-objects. Nowadays, finding useful knowledge from large information networks has attracted the attention of a large number of researchers. Some famous ranking algorithms like P-PageRank, PathSim and SimRank have been proposed to find the Top-K similar objects. However, SimRank has very high computational complexity while PathSim only does similarity measurement based on a single meta path. In this paper, we develop a novel HighSim algorithm, which integrates the PathSim algorithm and the basic methodology in LINE algorithm, to leverage the similarity ranking by considering both the research topics and the venues of published papers of different authors. In specific, we use PathSim based on the meta path Author-Paper-Venue-Paper-Author (AVPVA) to find the similarity of the venues of published papers. And LINE is used to find the similar research topics of different authors through their cited papers, i.e. references. Then we use the dataset in bibliographic networks extracted from DBLP to evaluate the performance of our new algorithm. The results show the effectiveness and flexibility of our proposed algorithm.
What problem does this paper attempt to address?