Clustering with Relative Constraints

Eric Yi Liu,Zhaojun Zhang,Wei Wang
DOI: https://doi.org/10.1145/2020408.2020564
2011-01-01
Abstract:Recent studies have suggested using relative distance comparisons as constraints to represent domain knowledge. A natural extension to relative comparisons is the combination of two comparisons defined on the same set of three instances. Constraints in this form, termed Relative Constraints, provide a unified knowledge representation for both partitional and hierarchical clusterings. But many key properties of relative constraints remain unknown. In this paper, we answer the following important questions that enable the broader application of relative constraints in general clustering problems: " Feasibility: Does there exist a clustering that satisfies a given set of relative constraints? (consistency of constraints) "Completeness: Given a set of consistent relative constraints, how can one derive a complete clustering without running into dead-ends? " Informativeness: How can one extract the most informative relative constraints from given knowledge sources? We show that any hierarchical domain knowledge can be easily represented by relative constraints. We further present a hierarchical algorithm that finds a clustering satisfying all given constraints in polynomial time. Experiments showed that our algorithm achieves significantly higher accuracy than the existing metric learning approach based on relative comparisons.
What problem does this paper attempt to address?