Abstract:Missing data is one of the challenges a researcher encounters while attempting to draw information from data. The first step in solving this issue is to have the data stage ready for processing. Much effort has been made in this area; removing instances with missing data is a popular method for handling missing data, but it has drawbacks, including bias. It will be impacted negatively on the results. How missing values are handled depends on several vectors, including data types, missing rates, and missing mechanisms. It covers missing data patterns as well as missing at random, missing at completely random, and missing not at random. Other suggestions include using numerous imputation techniques divided into various categories, such as statistical and machine learning methods. One strategy to improve a model's output is to weight the feature values to better the performance of classification or regression approaches. This research developed a new imputation technique called correlation coefficient min-max weighted imputation (CCMMWI). It combines the correlation coefficient and min-max normalization techniques to balance the feature values. The proposed technique seeks to increase the contribution of features by considering how those elements relate to the desired functionality. We evaluated several established techniques to assess the findings, including statistical techniques, mean and EM imputation, and machine learning imputation techniques, including k-NNI, and MICE. The evaluation also used the imputation techniques CBRL, CBRC, and ExtraImpute. We use various sizes of datasets, missing rates, and random patterns. To compare the imputed datasets and original data, we finally provide the findings and assess them using the root mean squared error (RMSE), mean absolute error (MAE), and R2. According to the findings, the proposed CCMMWI performs better than most other solutions in practically all missing-rate scenarios.

Missing Value Imputation via Clusterwise Linear Regression

Missing value imputation using unsupervised machine learning techniques

Missing Data Imputation: Focusing on Single Imputation.

Missing Value Estimation Algorithms on Cluster and Representativeness Preservation of Gene Expression Microarray Data

Optimal Clustering with Missing Values

A Novel Fuzzy Rough Clustering Parameter-based missing value imputation

Missing Value Estimation using Clustering and Deep Learning within Multiple Imputation Framework

Missing data imputation using correlation coefficient and min-max normalization weighting

Missing Data Imputation for Classification Problems

Hybrid Missing Value Imputation Algorithms Using Fuzzy C-Means and Vaguely Quantified Rough Set

Imputations for High Missing Rate Data in Covariates Via Semi-supervised Learning Approach

Usage of Clustering and Weighted Nearest Neighbors for Efficient Missing Data Imputation of Microarray Gene Expression Dataset

Missing Data Imputation by Utilizing Information Within Incomplete Instances

Handling missing data in model-based clustering

Win-Win: On Simultaneous Clustering and Imputing over Incomplete Data

An approach to dealing with missing values in heterogeneous data using k-nearest neighbors

Missing data imputation using classification and regression trees

CHOOSING APPROPRIATE IMPUTATION METHODS FOR MISSING DATA: A DECISION ALGORITHM ON METHODS FOR MISSING DATA

A novel ranked k-nearest neighbors algorithm for missing data imputation

Hybrid Missing Value Imputation Algorithm- KLR

Missing Values Imputation Based on Iterative Learning