Fast and robust K-means clustering via feature learning on high-dimensional data

Xiaodong Wang,Rung-Ching Chen,Fei Yan
DOI: https://doi.org/10.1109/ICAwST.2017.8256444
2017-01-01
Abstract:K-means is an efficient method and has achieved empirical success in various kinds of applications. However, it is hard to deal with high-dimensional data, which often contain noises and redundant features. Although existing methods try to fix this problem via dimension reduction or introducing the robust loss function, they still have two limitations. On one hand, they usually impose the eigenvalue decomposition to obtain the transformation matrix, which needs expensive computational cost. On the other hand, the extensions with robust loss function perform similarity measurement in the original feature space, which suffers from the outliers. To solve these problems, we propose a fast and robust subspace clustering algorithm. The proposed algorithm combines the group sparsity loss function and feature selection into a joint framework, which can reduce the effect of outliers. Besides, within such framework, the optimal feature subset can be calculated without eigenvalue decomposition, and thus it can be applied to high-dimensional data. Experimental results on several benchmark datasets demonstrate the advantage of the proposed model.
What problem does this paper attempt to address?