KLNCC: A new nonlinear correlation clustering algorithm based on KL-divergence

Chaofeng Sha,Xipeng Qiu,Aoying Zhou
DOI: https://doi.org/10.1109/CIT.2008.4594661
2008-01-01
Abstract:The problem of finding correlation among subsets of features in high-dimensional data arises in many applications. There has been much work on finding those correlations, including linear and nonlinear correlation clusters. In this paper, we present KLNCC, a novel nonlinear correlation clustering algorithm which adopts a dynamic two-phase approach. In the first phase, we find micro clusters by EM algorithm. In the second phase, these microclusters are merged in a bottom-up manner resulting in a dendrogram. The final clustering is determined by the users. When merging microclusters, we adopt the KL-divergence as the distance between two microclusters, which has explicit form when we use the EM clustering algorithm to find the microclusters. Our experimental evaluation on several real datasets demonstrates that KLNCC indeed discovers meaningful and accurate nonlinear correlation clusters.
What problem does this paper attempt to address?