Abstract:Missing value imputation (MVI) is a key task in data science, in which learning models are built from incomplete data. In contrast to externally driven MVI algorithms, this study proposes a novel risk-minimisation-based MVI algorithm (RM-MVI) that considers both the internal characteristics of missing data and the external performance for specific classification applications. RM-MVI is technically designed for labelled data and is applied in two stages: filling with structural risk minimization (SRM) and refining with empirical risk minimization (ERM). In the filling stage, an autoencoder with a single hidden layer is trained on the original dataset without missing values. Missing values are first initialised with random numbers, and the imputation values are then preliminarily optimised based on the derived updating rule to minimise the structural risk-oriented objective function. After the imputation values have been preliminarily optimised in the filling stage, a neural-network-based classifier is trained in the refining stage to optimise the imputation values sophisticatedly by reducing the empirical risk. Experiments were conducted on several benchmark datasets to validate the feasibility, rationality, and effectiveness of the proposed RM-MVI algorithm. The results show that (1) the optimization processes of the imputation values corresponding to the SRM and ERM are convergent so that the optimized imputation values can be obtained; (2) SRM can ensure distribution consistency of the imputation values that are preliminarily optimised in the filling stage, while ERM can optimise the imputation values sophisticatedly in the refining stage, which is more helpful for classifier training; and (3) the RM-MVI algorithm can yield considerably better MVI performance on benchmark datasets than 11 well-known MVI algorithms, such as a 26% higher distribution consistency ratio and 2% to 5% higher testing accuracies for 6 classifiers on average. This demonstrates that RM-MVI is a viable approach for addressing MVI problems.

A novel and efficient risk minimization-based missing value imputation algorithm

Machine Learning for Missing Value Imputation

A Robust Missing Value Imputation Method MifImpute For Incomplete Molecular Descriptor Data And Comparative Analysis With Other Missing Value Imputation Methods

Missing Value Imputation on Multidimensional Time Series

Conditional expectation with regularization for missing data imputation

MIDIA: exploring denoising autoencoders for missing data imputation

Missing value imputation: a review and analysis of the literature (2006–2017)

Multiple Imputation with Multivariate Imputation by Chained Equation (mice) Package

MISNN: Multiple Imputation via Semi-parametric Neural Networks

Do we really need imputation in AutoML predictive modeling?

M$^3$-Impute: Mask-guided Representation Learning for Missing Value Imputation

A Novel Missing-Rate-Oriented Selective Algorithm for Handling Missing Data by Minimizing Imputation

A comparative study of evaluating missing value imputation methods in label-free proteomics

A novel machine learning-based imputation strategy for missing data in step-stress accelerated degradation test

Performance comparison of State-of-the-art Missing Value Imputation Algorithms on Some Bench mark Datasets

Hybrid Missing Value Imputation Algorithms Using Fuzzy C-Means and Vaguely Quantified Rough Set

Evaluation of imputation techniques with varying percentage of missing data

An Intelligent Missing Data Imputation Techniques: A Review

iDMI: A novel technique for missing value imputation using a decision tree and expectation-maximization algorithm

Method for Incomplete and Imbalanced Data Based on Multivariate Imputation by Chained Equations and Ensemble Learning

MVIRA: A model based on Missing Value Imputation and Reliability Assessment for mortality risk prediction