Abstract:Missing data is one of the challenges a researcher encounters while attempting to draw information from data. The first step in solving this issue is to have the data stage ready for processing. Much effort has been made in this area; removing instances with missing data is a popular method for handling missing data, but it has drawbacks, including bias. It will be impacted negatively on the results. How missing values are handled depends on several vectors, including data types, missing rates, and missing mechanisms. It covers missing data patterns as well as missing at random, missing at completely random, and missing not at random. Other suggestions include using numerous imputation techniques divided into various categories, such as statistical and machine learning methods. One strategy to improve a model's output is to weight the feature values to better the performance of classification or regression approaches. This research developed a new imputation technique called correlation coefficient min-max weighted imputation (CCMMWI). It combines the correlation coefficient and min-max normalization techniques to balance the feature values. The proposed technique seeks to increase the contribution of features by considering how those elements relate to the desired functionality. We evaluated several established techniques to assess the findings, including statistical techniques, mean and EM imputation, and machine learning imputation techniques, including k-NNI, and MICE. The evaluation also used the imputation techniques CBRL, CBRC, and ExtraImpute. We use various sizes of datasets, missing rates, and random patterns. To compare the imputed datasets and original data, we finally provide the findings and assess them using the root mean squared error (RMSE), mean absolute error (MAE), and R2. According to the findings, the proposed CCMMWI performs better than most other solutions in practically all missing-rate scenarios.

Information-Content-Informed Kendall-tau Correlation: Utilizing Missing Values

Impact of Missing Data on Correlation Coefficient Values: Deletion and Imputation Methods for Data Preparation

Multivariate Analysis of Data Sets with Missing Values: An Information Theory-Based Reliability Function

Characterization of missing values in untargeted MS-based metabolomics data and evaluation of missing data handling strategies

R2CI: Information theoretic-guided feature selection with multiple correlations

IRTCI: Item Response Theory for Categorical Imputation

The Missing Person problem through the lens of information theory

kendallknight: An R Package for Efficient Implementation of Kendall's Correlation Coefficient Computation

Exploring Inter-Sensor Correlation for Missing Data Estimation

Missing data imputation using correlation coefficient and min-max normalization weighting

Estimating and accounting for unobserved covariates in high dimensional correlated data

Missing Value Estimation Algorithms on Cluster and Representativeness Preservation of Gene Expression Microarray Data

Gene ranking and biomarker discovery under correlation

Correlation visualization under missing values: a comparison between imputation and direct parameter estimation methods

Hybrid Missing Value Imputation Algorithm- KLR

Fast matrix completion in epigenetic methylation studies with informative covariates

Missing Value Imputation Approach for Mass Spectrometry-based Metabolomics Data

A New Correlation Coefficient for Aggregating Non-strict and Incomplete Rankings

Kendall Correlation Coefficients for Portfolio Optimization

Imputation of missing values in lipidomic datasets

A novel ranked k-nearest neighbors algorithm for missing data imputation