Iterative missing value imputation based on feature importance

Cong Guo,Chun Liu,Wei Yang
2023-11-14
Abstract:Many datasets suffer from missing values due to various reasons,which not only increases the processing difficulty of related tasks but also reduces the accuracy of classification. To address this problem, the mainstream approach is to use missing value imputation to complete the dataset. Existing imputation methods estimate the missing parts based on the observed values in the original feature space, and they treat all features as equally important during data completion, while in fact different features have different importance. Therefore, we have designed an imputation method that considers feature importance. This algorithm iteratively performs matrix completion and feature importance learning, and specifically, matrix completion is based on a filling loss that incorporates feature importance. Our experimental analysis involves three types of datasets: synthetic datasets with different noisy features and missing values, real-world datasets with artificially generated missing values, and real-world datasets originally containing missing values. The results on these datasets consistently show that the proposed method outperforms the existing five imputation this http URL the best of our knowledge, this is the first work that considers feature importance in the imputation model.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
This paper attempts to solve the problem of missing values in datasets. Specifically, many datasets contain missing values for various reasons, which not only increases the processing difficulty of related tasks but also reduces the classification accuracy. To solve this problem, the mainstream method is to use imputation to complete the dataset. However, existing imputation methods treat all features as equally important during the data imputation process, while in fact the importance of different features varies. Therefore, the author designed an imputation method that takes feature importance into account. This algorithm iteratively performs matrix completion and feature importance learning, especially by incorporating feature importance in the imputation loss. This method aims to improve the imputation quality of important feature items, thereby better guiding feature selection and classifier construction. ### Main contributions of the paper: 1. **Proposing a new imputation method**: This method takes into account the importance of features during the imputation process and improves the imputation quality through iterative matrix completion and feature importance learning. 2. **Introducing feature importance**: Different from traditional methods, this method dynamically adjusts feature weights during the imputation process, making important features more accurately imputed. 3. **Experimental verification**: Through experiments on synthetic datasets and real - world datasets, it is proved that this method is superior to five existing imputation algorithms. ### Specific problem description: - **Missing value problem**: Many real - world datasets contain missing values, which will affect subsequent analysis and modeling. - **Limitations of existing methods**: Most imputation methods do not take into account the importance of features, resulting in poor imputation effects, especially in high - dimensional datasets. - **Solution**: By introducing feature importance learning, this method can impute missing values more accurately and is helpful for subsequent tasks such as feature selection and classification. ### Experimental results: - **Synthetic datasets**: Under different noise features and missing rates, the performance of this method is better than that of the other five imputation algorithms. - **Real - world datasets**: Experiments were carried out on multiple real - world datasets, and the results show that this method is also superior to other methods in classification performance. ### Conclusion: The iterative imputation method based on feature importance proposed in this paper performs well in dealing with the problem of missing values, especially having an obvious advantage in high - dimensional datasets. This is the first work to introduce feature importance into the imputation model, providing a new idea for dealing with missing values.