Genetic Programming for Borderline Instance Detection in High-Dimensional Unbalanced Classification

Wenbin Pei,Bing Xue,Lin Shang,Mengjie Zhang
DOI: https://doi.org/10.1145/3449639.3459284
2021-01-01
Abstract:In classification, when class overlap is intertwined with the issue of class imbalance, it is often challenging to discover useful patterns because of an ambiguous boundary between the majority class and the minority class. This becomes more difficult if the data is high-dimensional. To date, very few pieces of work have investigated how the class overlap issue can be effectively addressed or alleviated in classification with high-dimensional unbalanced data. In this paper, we propose a new genetic programming based method, which is able to automatically and directly detect borderline instances, in order to address the class overlap issue in classification with high-dimensional unbalanced data. In the proposed method, each individual has two trees to be trained together based on different classification rules. The proposed method is examined and compared with baseline methods on high-dimensional unbalanced datasets. Experimental results show that the proposed method achieves better classification performance than the baseline methods in almost all cases.
What problem does this paper attempt to address?