Generalization of Clustering Agreements and Distances for Overlapping Clusters and Network Communities

Reihaneh Rabbany,Osmar R. Zaïane
DOI: https://doi.org/10.1007/s10618-015-0426-x
2015-03-06
Abstract:A measure of distance between two clusterings has important applications, including clustering validation and ensemble clustering. Generally, such distance measure provides navigation through the space of possible clusterings. Mostly used in cluster validation, a normalized clustering distance, a.k.a. agreement measure, compares a given clustering result against the ground-truth clustering. Clustering agreement measures are often classified into two families of pair-counting and information theoretic measures, with the widely-used representatives of Adjusted Rand Index (ARI) and Normalized Mutual Information (NMI), respectively. This paper sheds light on the relation between these two families through a generalization. It further presents an alternative algebraic formulation for these agreement measures which incorporates an intuitive clustering distance, which is defined based on the analogous between cluster overlaps and co-memberships of nodes in clusters. Unlike the original measures, it is easily extendable for different cases, including overlapping clusters and clusters of inter-related data for complex networks. These two extensions are, in particular, important in the context of finding clusters in social and information networks, a.k.a communities.
Social and Information Networks,Physics and Society
What problem does this paper attempt to address?