Abstract:Navigating the intricate world of data analytics, one method has emerged as a key tool in confronting missing data: multiple imputation. Its strength is further fortified by its powerful variant, robust imputation, which enhances the precision and reliability of its results. In the challenging landscape of data analysis, non-robust methods can be swayed by a few extreme outliers, leading to skewed imputations and biased estimates. This can apply to both representative outliers—those true yet unusual values of your population—and non-representative outliers, which are mere measurement errors. Detecting these outliers in large or high-dimensional data sets often becomes as complex as unraveling a Gordian knot. The solution? Turn to robust imputation methods. Robust (imputation) methods effectively manage outliers and exhibit remarkable resistance to their influence, providing a more reliable approach to dealing with missing data. Moreover, these robust methods offer flexibility, accommodating even if the imputation model used is not a perfect fit. They are akin to a well-designed buffer system, absorbing slight deviations without compromising overall stability. In the latest advancement of statistical methodology, a new robust imputation algorithm has been introduced. This innovative solution addresses three significant challenges with robustness. It utilizes robust bootstrapping to manage model uncertainty during the imputation of a random sample; it incorporates robust fitting to reinforce accuracy; and it takes into account imputation uncertainty in a resilient manner. Furthermore, any complex regression or classification model for any variable with missing data can be run through the algorithm. With this new algorithm, we move one step closer to optimizing the accuracy and reliability of handling missing data. Using a realistic data set and a simulation study including a sensitivity analysis, the new alogorithm imputeRobust shows excellent performance compared with other common methods. Effectiveness was demonstrated by measures of precision for the prediction error, the coverage rates, and the mean square errors of the estimators, as well as by visual comparisons.

The MIDAS Touch: Accurate and Scalable Missing-Data Imputation with Deep Learning

Multiple Imputation with Denoising Autoencoder using Metamorphic Truth and Imputation Feedback

MIDIA: exploring denoising autoencoders for missing data imputation

Are deep learning models superior for missing data imputation in large surveys? Evidence from an empirical comparison

mDAE : modified Denoising AutoEncoder for missing data imputation

Proposition of a Theoretical Model for Missing Data Imputation using Deep Learning and Evolutionary Algorithms

Missing Value Imputation on Multidimensional Time Series

Long-Term Missing Value Imputation for Time Series Data Using Deep Neural Networks

Missing Features Reconstruction Using a Wasserstein Generative Adversarial Imputation Network

Multiple Imputation with Neural Network Gaussian Process for High-dimensional Incomplete Data

Siamese autoencoder architecture for the imputation of data missing not at random

Benchmarking Machine Learning Missing Data Imputation Methods in Large-Scale Mental Health Survey Databases

Multistage Large Segment Imputation Framework Based on Deep Learning and Statistic Metrics

Enhancing Precision in Large-Scale Data Analysis: An Innovative Robust Imputation Algorithm for Managing Outliers and Missing Values

Do we really need imputation in AutoML predictive modeling?

M-DEW: Extending Dynamic Ensemble Weighting to Handle Missing Values

An Intelligent Missing Data Imputation Techniques: A Review

Missing Value Estimation using Clustering and Deep Learning within Multiple Imputation Framework

Missing data imputation with adversarially-trained graph convolutional networks

Handling missing data through deep convolutional neural network

DiffImpute: Tabular Data Imputation With Denoising Diffusion Probabilistic Model