A hybrid clustering algorithm based on missing attribute interval estimation for incomplete data

Li Zhang,Zhaohong Bing,Liyong Zhang
DOI: https://doi.org/10.1007/s10044-014-0376-8
IF: 2.307
2014-06-01
Pattern Analysis and Applications
Abstract:Partially missing data sets are a prevailing problem in clustering analysis. We propose a hybrid algorithm combining fuzzy clustering with particle swarm optimization (PSO) for incomplete data clustering, and missing attributes are represented as intervals. Furthermore, we develop a neighbor interval reconstruction (NIR) method based on pre-classification results that estimates the nearest-neighbor interval of missing attribute using the nearest-neighbor rule, which avoids endpoints of intervals determined by different species information, thereby improving the accuracy of missing attribute intervals and enhancing the robustness of missing attribute imputation. Then, the PSO and fuzzy c-means hybrid algorithm are used for clustering the interval-valued data set, and the global optimization ability of the PSO can improve the accuracy of clustering results compared with gradient-based optimization methods. The experimental results for several UCI data sets show the superiority of the proposed NIR hybrid algorithm.
computer science, artificial intelligence
What problem does this paper attempt to address?