Abstract:Many datasets suffer from missing values due to various reasons,which not only increases the processing difficulty of related tasks but also reduces the accuracy of classification. To address this problem, the mainstream approach is to use missing value imputation to complete the dataset. Existing imputation methods estimate the missing parts based on the observed values in the original feature space, and they treat all features as equally important during data completion, while in fact different features have different importance. Therefore, we have designed an imputation method that considers feature importance. This algorithm iteratively performs matrix completion and feature importance learning, and specifically, matrix completion is based on a filling loss that incorporates feature importance. Our experimental analysis involves three types of datasets: synthetic datasets with different noisy features and missing values, real-world datasets with artificially generated missing values, and real-world datasets originally containing missing values. The results on these datasets consistently show that the proposed method outperforms the existing five imputation this http URL the best of our knowledge, this is the first work that considers feature importance in the imputation model.

What problem does this paper attempt to address?

This paper attempts to solve the problem of missing values in datasets. Specifically, many datasets contain missing values for various reasons, which not only increases the processing difficulty of related tasks but also reduces the classification accuracy. To solve this problem, the mainstream method is to use imputation to complete the dataset. However, existing imputation methods treat all features as equally important during the data imputation process, while in fact the importance of different features varies. Therefore, the author designed an imputation method that takes feature importance into account. This algorithm iteratively performs matrix completion and feature importance learning, especially by incorporating feature importance in the imputation loss. This method aims to improve the imputation quality of important feature items, thereby better guiding feature selection and classifier construction. ### Main contributions of the paper: 1. **Proposing a new imputation method**: This method takes into account the importance of features during the imputation process and improves the imputation quality through iterative matrix completion and feature importance learning. 2. **Introducing feature importance**: Different from traditional methods, this method dynamically adjusts feature weights during the imputation process, making important features more accurately imputed. 3. **Experimental verification**: Through experiments on synthetic datasets and real - world datasets, it is proved that this method is superior to five existing imputation algorithms. ### Specific problem description: - **Missing value problem**: Many real - world datasets contain missing values, which will affect subsequent analysis and modeling. - **Limitations of existing methods**: Most imputation methods do not take into account the importance of features, resulting in poor imputation effects, especially in high - dimensional datasets. - **Solution**: By introducing feature importance learning, this method can impute missing values more accurately and is helpful for subsequent tasks such as feature selection and classification. ### Experimental results: - **Synthetic datasets**: Under different noise features and missing rates, the performance of this method is better than that of the other five imputation algorithms. - **Real - world datasets**: Experiments were carried out on multiple real - world datasets, and the results show that this method is also superior to other methods in classification performance. ### Conclusion: The iterative imputation method based on feature importance proposed in this paper performs well in dealing with the problem of missing values, especially having an obvious advantage in high - dimensional datasets. This is the first work to introduce feature importance into the imputation model, providing a new idea for dealing with missing values.

Iterative missing value imputation based on feature importance

Missing Values Imputation Based on Iterative Learning

A novel feature selection framework for incomplete data

Missing Data Imputation by Utilizing Information Within Incomplete Instances

Hybrid Missing Value Imputation Algorithms Using Fuzzy C-Means and Vaguely Quantified Rough Set

Missing values imputation hypothesis: An experimental evaluation

NIIA: Nonparametric Iterative Imputation Algorithm

Missing Data Imputation: Focusing on Single Imputation.

A Benchmark for Data Imputation Methods

Feature Analysis for Incomplete Time Series Classification

On the Performance of Imputation Techniques for Missing Values on Healthcare Datasets

M$^3$-Impute: Mask-guided Representation Learning for Missing Value Imputation

Data Imputation by Pursuing Better Classification: A Supervised Kernel-Based Method

Missing Features Reconstruction and Its Impact on Classification Accuracy

An Experimental Survey of Missing Data Imputation Algorithms

A Robust Missing Value Imputation Method MifImpute For Incomplete Molecular Descriptor Data And Comparative Analysis With Other Missing Value Imputation Methods

No imputation without representation

A Novel Fuzzy Rough Clustering Parameter-based missing value imputation

Missing Value Estimation for Mixed-Attribute Data Sets

An Intelligent Missing Data Imputation Techniques: A Review

Imputation using information fusion technique for sensor generated incomplete data with high missing gap