Semi-supervised Hierarchical Clustering Analysis for High Dimensional Data

Yuntao Qian,Xiaoxu Du,Qi Wang
2006-01-01
Abstract:In many data mining tasks, there is a large supply of unlabeled data but limited labeled data since it is expensive generated. Therefore, a number of semi-supervised clustering algorithms have been proposed, but few of them are specially designed for high dimensional data. High dimensionality is a difficult challenge for clustering analysis due to the inherent sparse distribution, and most of popular clustering algorithms including semi-supervised ones will be invalid in high dimensional space. In this paper, a semi-supervised hierarchical clustering algorithm for high dimensional data is proposed, which is based on the combination of semi- supervised clustering and dimensionality reduction. In order to achieve high harmony between dimensionality reduction and inherent cluster structure detection, the number of dimensions is reduced sequentially as the clusters are gradually formed in the hierarchical clustering procedure. The experimental results show the effectiveness of our method.
What problem does this paper attempt to address?