Ranking-based Clustering on General Heterogeneous Information Networks by Network Projection.

Chuan Shi,Ran Wang,Yitong Li,Philip S. Yu,Bin Wu
DOI: https://doi.org/10.1145/2661829.2662040
2014-01-01
Abstract:Recently there is an increasing attention in heterogeneous information network analysis, which models networked data as networks including different types of objects and relations. Many data mining tasks have been exploited in heterogeneous networks, among which clustering and ranking are two basic tasks. These two tasks are usually done separately, whereas recent researches show that they can mutually enhance each other. Unfortunately, these works are limited to heterogeneous networks with special structures (e.g. bipartite or star-schema network ). However, real data are more complex and irregular, so it is desirable to design a general method to manage objects and relations in heterogeneous networks with arbitrary schema. In this paper, we study the ranking-based clustering problem in a general heterogeneous information network and propose a novel solution HeProjI. HeProjI projects a general heterogeneous network into a sequence of sub-networks and an information transfer mechanism is designed to keep the consistency among sub-networks. For each sub-network, a path-based random walk model is built to estimate the reachable probability of objects which can be used for clustering and ranking analysis. Iteratively analyzing each sub-network leads to effective ranking-based clustering. Extensive experiments on three real datasets illustrate that HeProjI can achieve better clustering and ranking performances compared to other well-established algorithms.
What problem does this paper attempt to address?