Incremental Reduction of Imbalanced Distributed Mixed Data Based on K-Nearest Neighbor Rough Set

Weihua Xu,Changchun Liu
DOI: https://doi.org/10.1016/j.ijar.2024.109218
IF: 4.452
2024-01-01
International Journal of Approximate Reasoning
Abstract:Incremental feature selection methods have garnered significant research attention in improving the efficiency of feature selection for dynamic datasets. However, there is currently a dearth of research on incremental feature selection methods specifically targeted for unbalanced mixed-type data. Furthermore, the widely used neighborhood rough set algorithm exhibits low classification efficiency for imbalanced data distribution and performs poorly in classifying mixed samples. Motivated by these two challenges, we investigate the use of an incremental feature reduction algorithm based on k-nearest neighbors and mutual information in this study. Firstly, we enhance the capabilities of the neighborhood rough set model by incorporating the concept of k-nearest neighbors, thereby improving its ability to handle samples with varying densities. Subsequently, we apply information entropy theory and combine neighborhood mutual information with the maximum relevance minimum redundancy criterion to construct a novel feature importance evaluation function. This function is utilized as the evaluation metric for feature selection. Finally, an incremental feature selection algorithm is designed based on the above static algorithm. Experiments were conducted on twelve public datasets to evaluate the robustness of the proposed feature metrics and the performance of the incremental feature selection algorithm. The experimental results validated the robustness of the proposed metrics and demonstrated that our incremental algorithm is effective and efficient in feature reduction for updating unbalanced mixed data.
What problem does this paper attempt to address?