PoClustering: Lossless Clustering of Dissimilarity Data

Jinze Liu,Qi Zhang,Wei Wang,Leonard McMillan,Jan Prins
DOI: https://doi.org/10.1137/1.9781611972771.61
2007-01-01
Abstract:Given a set of objects V with a dissimilarity measure between pairs of objects in V, a PoCluster is a collection of sets P subset of powerset(V) partially ordered by the subset of relation such that S subset of T if the maximal dissimilarity among objects in S is less than the maximal dissimilarity among objects in T. PoClusters capture categorizations of objects that are not strictly hierarchical, such as those found in ontologies. PoClusters can not, in general, be constructed using hierarchical clustering algorithms. In this paper, we examine the relationship between PoClusters and dissimilarity matrices and prove that PoClusters are in one-to-one correspondence with the set of dissimilarity matrices. The PoClustering problem is NP-Complete, and we present a heuristic algorithm for it in this paper. Experiments on both synthetic and real datasets demonstrate the quality and scalability of the algorithms.
What problem does this paper attempt to address?