Research on Incomplete Data Clustering

Yonglin Leng,Qingchen Zhang,Fuyu Lu
DOI: https://doi.org/10.13537/j.issn.1004-3918.2014.11.016
2014-01-01
Abstract:A large number of missing data exist in the process of data collection,which are called incomplete data. Traditional methods in clustering incomplete data use imputation or discarding strategy for data clustering. In this paper,we propose a K-means clustering of incomplete data based on the incomplete information system theory. The algorithm firstly divides the data set into a complete data set and the incomplete data set ,and using K-means algorithm for the complete data set clustering. Then the incomplete data are divided into the corresponding clusters based on the design division of similarity measurement. Experiment demonstrates that the proposed algorithm can cluster the incomplete big data directly and improve the accuracy and effectivity.
What problem does this paper attempt to address?