A Robust Fuzzy C-Means Clustering Algorithm for Incomplete Data.

Jinhua Li,Shiji Song,Yuli Zhang,Kang Li
DOI: https://doi.org/10.1007/978-981-10-6373-2_1
2017-01-01
Abstract:Date sets with missing feature values are prevalent in clustering analysis. Most existing clustering methods for incomplete data rely on imputations of missing feature values. However, accurate imputations are usually hard to obtain especially for small-size or highly corrupted data sets. To address this issue, this paper proposes a robust fuzzy c-means (RFCM) clustering algorithm, which does not require imputations. The proposed RFCM represents the missing feature values by intervals, which can be easily constructed using the K-nearest neighbors method, and adopts a min-max optimization model to reduce the impact of noises on clustering performance. We give an equivalent tractable reformulation of the min-max optimization problem and propose an efficient solution method based on smoothing and gradient projection techniques. Experiments on UCI data sets validate the effectiveness of the proposed RFCM algorithm by comparison with existing clustering methods for incomplete data.
What problem does this paper attempt to address?