Hierarchical Information-Theoretic Co-Clustering For High Dimensional Data

Yuanyuan Wang,Yunming Ye,Xutao Li,Michael K. Ng,Joshua Huang
2011-01-01
Abstract:Hierarchical clustering is an important technique for hierarchical data exploration applications. However, most existing hierarchial methods are based on traditional one-side clustering, which is not effective for handling high dimensional data. In this paper, we develop a partitional hierarchical co-clustering framework and propose a Hierarchical Information-Theoretical Co-Clustering (HITCC) algorithm. The algorithm conducts a series of binary partitions of objects on a data set via the Information-Theoretical Co-Clustering (ITCC) procedure, and generates a hierarchical management of object clusters. Due to simultaneously clustering of features and objects in the process of building a cluster tree, the HITCC algorithm can identify subspace clusters at different-level abstractions and acquire good clustering hierarchies. Compared with the fiat ITCC algorithm and six state-of-the-art hierarchical clustering algorithms on various data sets, the new algorithm demonstrated much better performance.
What problem does this paper attempt to address?