Abstract:Researchers and practitioners who use databases usually feel that it is cumbersome in knowledge discovery or application development due to the issue of missing data. Though some approaches can work with a certain rate of incomplete data, a large portion of them demands high data quality with completeness. Therefore, a great number of strategies have been designed to process missingness particularly in the way of imputation. Single imputation methods initially succeeded in predicting the missing values for specific types of distributions. Yet, the multiple imputation algorithms have maintained prevalent because of the further promotion of validity by minimizing the bias iteratively and less requirement on prior knowledge to the distributions. This article carefully reviews the state of the art and proposes a hybrid missing data completion method named Multiple Imputation using Gray-system-theory and Entropy based on Clustering ( MIGEC ). Firstly, the non-missing data instances are separated into several clusters. Then, the imputed value is obtained after multiple calculations by utilizing the information entropy of the proximal category for each incomplete instance in terms of the similarity metric based on Gray System Theory ( GST ). Experimental results on University of California Irvine ( UCI ) datasets illustrate the superiority of MIGEC to other current achievements on accuracy for either numeric or categorical attributes under different missing mechanisms. Further discussion on real aerospace datasets states MIGEC is also applicable for the specific area with both more precise inference and faster convergence than other multiple imputation methods in general.

Leachable Component Clustering

A Survey on Incomplete Multi-view Clustering

Effective Density-Based Clustering Algorithms for Incomplete Data

An Improved Mean Imputation Clustering Algorithm for Incomplete Data

Research on Incomplete Data Clustering

A Global Clustering Approach Using Hybrid Optimization for Incomplete Data Based on Interval Reconstruction of Missing Value.

K-Means Clustering With Incomplete Data

Incomplete Big Data Distributed Clustering

Sequential Combination Methods for Data Clustering Analysis

Win-Win: On Simultaneous Clustering and Imputing over Incomplete Data

Fuzzy C-Means Clustering of Incomplete Data Based on Probabilistic Information Granules of Missing Values

CLINCH: clustering incomplete high-dimensional data for data mining application

Distributed Clustering and Filling Algorithm of Incomplete Big Data

Gaussian Mixture Model Clustering with Incomplete Data

Affinity Propagation Clustering with Incomplete Data

A Three-Way Decisions Clustering Algorithm for Incomplete Data

A hybrid genetic algorithm–fuzzy c-means approach for incomplete data clustering based on nearest-neighbor intervals

K-Nearest Neighbor Intervals Based AP Clustering Algorithm for Large Incomplete Data

An Improved Incomplete AP Clustering Algorithm Based on K Nearest Neighbours

Partial Clustering Ensemble

Missing Data Analyses: a Hybrid Multiple Imputation Algorithm Using Gray System Theory and Entropy Based on Clustering