Abstract:In real cases, missing values tend to contain meaningful information that should be acquired or should be analyzed before the incomplete dataset is used for machine learning tasks. In this work, two algorithms named jointly fuzzy C-Means and vaguely quantified nearest neighbor (VQNN) imputation (JFCM-VQNNI) and jointly fuzzy C-Means and fitted VQNN imputation (JFCM-FVQNNI) have been proposed by considering clustering conception and sufficient extraction of uncertain information. In the proposed JFCM-VQNNI and JFCM-FVQNNI algorithm, the missing value is regarded as a decision feature, and then, the prediction is generated for the objects that contain at least one missing value. Specially, as for JFCM-VQNNI algorithm, indistinguishable matrixes, tolerance relations, and fuzzy membership relations are adopted to identify the potential closest filled values based on corresponding similar objects and related clusters. On the basis of JFCM-VQNNI algorithm, JFCM-FVQNNI algorithm synthetic analyzes the fuzzy membership of the dependent features for instances with each cluster. In order to fill the missing values more accurately, JFCM-FVQNNI algorithm performs fuzzy decision membership adjustment in each object with respect to the related clusters by considering highly relevant decision attributes. The experiments have been carried out on five datasets. Based on the analysis of root-mean-square error, mean absolute error, comparison of imputation values with actual values, and classification accuracy results analysis, we can draw the conclusion that the proposed JFCM-FVQNNI and JFCM-VQNNI algorithms yields sufficient and reasonable imputation performance results by comparing with fuzzy C-Means parameter-based imputation algorithm and fuzzy C-Means rough parameter-based imputation algorithm.

A hybrid genetic algorithm–fuzzy c-means approach for incomplete data clustering based on nearest-neighbor intervals

A Global Clustering Approach Using Hybrid Optimization for Incomplete Data Based on Interval Reconstruction of Missing Value.

A hybrid clustering algorithm based on missing attribute interval estimation for incomplete data

Interval Kernel Fuzzy C-Means Clustering of Incomplete Data.

Fuzzy C-Means Clustering of Incomplete Data Based on Probabilistic Information Granules of Missing Values

A Robust Fuzzy C-Means Clustering Algorithm for Incomplete Data.

An Interval Weighed Fuzzy C-Means Clustering By Genetically Guided Alternating Optimization

A generalized fuzzy clustering framework for incomplete data by integrating feature weighted and kernel learning

K-Nearest Neighbor Intervals Based AP Clustering Algorithm for Large Incomplete Data

Genetic Algorithms Applied to Multi-Class Clustering for Gene Expression Data

An Improved Mean Imputation Clustering Algorithm for Incomplete Data

A Three-Way Decisions Clustering Algorithm for Incomplete Data

An approach to dealing with missing values in heterogeneous data using k-nearest neighbors

Interval-valued possibilistic fuzzy C-means clustering algorithm

Hybrid Missing Value Imputation Algorithms Using Fuzzy C-Means and Vaguely Quantified Rough Set

K-Means Clustering With Incomplete Data

A Human-Computer Cooperation Fuzzy C-Means Clustering with Interval-Valued Weights

Research on Incomplete Data Clustering

Robust K-Median and K-Means Clustering Algorithms for Incomplete Data

Interval Attributes Description Based FCM Clustering Algorithm for Noisy Data.

Hybrid Genetic Clustering by Using FCM and Geodesic Distance for Complex Distributed Data