Detecting Overlapping Areas in Unbalanced High-Dimensional Data Using Neighborhood Rough Set and Genetic Programming
Wenbin Pei,Bing Xue,Lin Shang,Mengjie Zhang
DOI: https://doi.org/10.1109/tevc.2022.3203862
IF: 16.497
2023-01-01
IEEE Transactions on Evolutionary Computation
Abstract:Unbalanced classification has attracted widespread interest because of its broad applications. However, due to mainly the uneven class distribution, constructed classifiers are usually biased toward the majority class, and thereby perform terribly on the minority class. Unfortunately, the minority class is often the class of interest in many real-world applications. High dimensionality often further degrades the classification performance, making it more complicated to address the class imbalance issue. Genetic programming (GP) has been applied to construct classifiers, which can simultaneously select good-quality features to improve the classification performance. To handle the class imbalance issue, cost-sensitive GP classifiers treat the minority class as being more important than the majority class, but this may cause an accuracy decrease in overlapping areas where the prior probabilities of the two classes are almost the same. To date, most cost-sensitive classification methods have not been specifically investigated how the impacts of overlapping areas on cost-sensitive classifiers can be avoided. In this study, we propose a new cost-sensitive GP method, where rough set theory is employed to detect overlapping areas before training cost-sensitive classifiers for classification with unbalanced high-dimensional data. The proposed method is compared with 46 popular classification methods, including 10 GP methods and 36 non-GP methods on 14 datasets that are unbalanced and high dimensional. The experimental results indicate that the proposed method performs better than the compared methods in almost all cases.