Feature Selection for Unbalanced Distribution Hybrid Data Based on ${K}$-Nearest Neighborhood Rough Set

Weihua Xu,Ziting Yuan,Zheng Liu
DOI: https://doi.org/10.1109/tai.2023.3237203
2023-01-01
IEEE Transactions on Artificial Intelligence
Abstract:Neighborhood rough sets are now widely used to process numerical data. Nevertheless, most of the existing neighborhood rough sets are not able to distinguish class mixture samples well when dealing with classification problems. That is, it cannot effectively classify categories when dealing with data with an unbalanced distribution. Because of this, in this article, we propose a new feature selection method that takes into consideration both heterogeneous data and feature interaction. The proposed model well integrates the ascendancy of ${\delta }$ -neighborhood and ${k}$ -nearest neighbor. Such heterogeneous data can be handled better than existing neighborhood models. We utilize information entropy theories such as mutual information and conditional mutual information and employ an iterative strategy to define the importance of each feature in decision making. Furthermore, we design a feature extraction algorithm based on the above idea. Experimental results display that the raised algorithm has superior effect than some existing algorithms, particularly the ${\delta }$ -neighborhood rough set model and the ${k}$ -nearest neighborhood rough set model.
What problem does this paper attempt to address?