Label correlation guided borderline oversampling for imbalanced multi-label data learning

Kai Zhang,Zhaoyang Mao,Peng Cao,Wei Liang,Jinzhu Yang,Weiping Li,Osmar R. Zaiane
DOI: https://doi.org/10.1016/j.knosys.2023.110938
IF: 8.139
2023-09-01
Knowledge-Based Systems
Abstract:Multi-label data classification has received much attention due to its wide range of application domains. Unfortunately, a class imbalance problem often occurs in multi-label datasets, causing challenges for classification algorithms. Oversampling is one of the most important approaches, as it generates minority label instances to balance the class distribution. However, existing oversampling methods ignore existing label correlations, resulting in the generation of inappropriate synthetic minority samples and making multi-label data classification tasks harder. In this work, we propose an oversampling method that considers label correlations and identifies two critical boundary regions for generating synthetic minority samples. Moreover, we propose a weighting strategy to assign weights to these instances based on their distance information. To evaluate the performance of our proposed method, we conducted experiments on sixteen public datasets. The results show that our approach outperforms the state-of-the-art approaches in terms of various assessment metrics, such as Macro F1 and Macro AUC. The code is available at https://github.com/IntelliDAL/Multi-label/tree/main/LCOS .
computer science, artificial intelligence
What problem does this paper attempt to address?