Enhancing the generalization of feature construction using genetic programming for imbalanced data with augmented non-overlap degree.

Zhuang Li,Jingyan Qin,Haiyan Gong,Xiaotong Zhang,Yadong Wan
DOI: https://doi.org/10.1109/BIBM52615.2021.9669863
2021-01-01
Abstract:Genetic programming (GP) has a significant achievement in feature construction and non-overlap degree can help to improve the generalization ability of GP based feature construction. However, the non-overlap degree is biased towards the majority class. In this paper, a novel GP based feature construction method with augmented non-overlap degree is proposed to enhance the generalization ability for imbalanced data. And the constructed features are evaluated by a novel function based on the combination of the area under the ROC curve metric and the augmented non-overlap degree. The generalization performance is evaluated not only by a particular classification algorithm, but also by six widely used classification algorithms. The experiments conducted on five imbalanced biomedical datasets with different imbalance rates show that the proposed GP-AANO method can achieve superior generalization performance for classification.
What problem does this paper attempt to address?