Abstract:Missing data is one of the challenges a researcher encounters while attempting to draw information from data. The first step in solving this issue is to have the data stage ready for processing. Much effort has been made in this area; removing instances with missing data is a popular method for handling missing data, but it has drawbacks, including bias. It will be impacted negatively on the results. How missing values are handled depends on several vectors, including data types, missing rates, and missing mechanisms. It covers missing data patterns as well as missing at random, missing at completely random, and missing not at random. Other suggestions include using numerous imputation techniques divided into various categories, such as statistical and machine learning methods. One strategy to improve a model's output is to weight the feature values to better the performance of classification or regression approaches. This research developed a new imputation technique called correlation coefficient min-max weighted imputation (CCMMWI). It combines the correlation coefficient and min-max normalization techniques to balance the feature values. The proposed technique seeks to increase the contribution of features by considering how those elements relate to the desired functionality. We evaluated several established techniques to assess the findings, including statistical techniques, mean and EM imputation, and machine learning imputation techniques, including k-NNI, and MICE. The evaluation also used the imputation techniques CBRL, CBRC, and ExtraImpute. We use various sizes of datasets, missing rates, and random patterns. To compare the imputed datasets and original data, we finally provide the findings and assess them using the root mean squared error (RMSE), mean absolute error (MAE), and R2. According to the findings, the proposed CCMMWI performs better than most other solutions in practically all missing-rate scenarios.

A Hierarchical Missing Value Imputation Method By Correlation-Based K-Nearest Neighbors

A novel ranked k-nearest neighbors algorithm for missing data imputation

APT-KNN:AN EFFICIENT MISSING VALUE IMPUTATION METHOD ORIENTED TOWARD CLASSIFICATION ISSUE

Missing Values Imputation Based on Iterative Learning

Impact of Missing Data on Correlation Coefficient Values: Deletion and Imputation Methods for Data Preparation

Missing data imputation by K nearest neighbours based on grey relational structure and mutual information

An Empirical Study of Dynamic Incomplete-Case Nearest Neighbor Imputation in Software Quality Data.

Integrated ECOD-KNN Algorithm for Missing Values Imputation in Datasets: Outlier Removal

Hybrid Missing Value Imputation Algorithm- KLR

Missing Value Imputation via Clusterwise Linear Regression

An approach to dealing with missing values in heterogeneous data using k-nearest neighbors

Missing data imputation using correlation coefficient and min-max normalization weighting

Missing Data Imputation for Classification Problems

Numerical Data Imputation for Multimodal Data Sets: A Probabilistic Nearest-Neighbor Kernel Density Approach

Enriching Data Imputation with Extensive Similarity Neighbors

Hybrid Missing Value Imputation Algorithms Using Fuzzy C-Means and Vaguely Quantified Rough Set

Dimensional Data KNN-Based Imputation

Performance Comparison of Hot-Deck Imputation, K-Nearest Neighbor Imputation, and Predictive Mean Matching in Missing Value Handling, Case Study: March 2019 SUSENAS Kor Dataset

Exploiting nearest neighbor data and fuzzy membership function to address missing values in classification

Temporal and Spatial Nearest Neighbor Values Based Missing Data Imputation in Wireless Sensor Networks

Correlation visualization under missing values: a comparison between imputation and direct parameter estimation methods