Binary classification for imbalanced datasets using twin hyperspheres based on conformal method

Jian Zheng,Lin Li,Shiyan Wang,Huyong Yan
DOI: https://doi.org/10.1007/s10586-024-04528-x
2024-05-23
Cluster Computing
Abstract:Aiming at binary classification of highly imbalanced data, this paper proposes a novel twin-hypersphere method with conformal transformation. To provide favorable environments that the hyperspheres can search the region containing the majority class and pay more attention to the region containing the minority class, conformal mapping is put on the original data region. Meanwhile, to tighten classification boundaries learned from the hyperspheres, a gain operation is implemented on the kernels. Experimental results show that the accuracy of classification boundaries learned by the proposed method reaches 0.880 on the synthetic datasets. Results also show and our classification accuracy is 0.731 on the highly imbalanced dataset with imbalanced ratio 87.8:1, which defeated against the competitors with significant advantages. Moreover, time consumption of the proposed method did not exponentially increase so that it is suitable for the classification to a large-scale scenario. We find that non-linear kernels are better at focusing on global regions, while conformal transformation can assist them better perception sub-regions. Conformal transformation is helpful the observation of the regions containing those hard-to-observe minority classes.
computer science, information systems, theory & methods
What problem does this paper attempt to address?