CC k EL: Compensation-based correlated k -labelsets for classifying imbalanced multi-label data

Qianpeng Xiao,Changbin Shao,Sen Xu,Xibei Yang,Hualong Yu
DOI: https://doi.org/10.3934/era.2024139
2024-01-01
Electronic Research Archive
Abstract:Imbalanced data distribution and label correlation are two intrinsic characteristics of multi-label data. This occurs because in this type of data, instances associated with certain labels may be sparse, and some labels may be associated with others, posing a challenge for traditional machine learning techniques. To simultaneously adapt imbalanced data distribution and label correlation, this study proposed a novel algorithm called compensation-based correlated k -labelsets (CC k EL). First, for each label, the CC k EL selects the k -1 strongest correlated labels in the label space to constitute multiple correlated k -labelsets; this improves its efficiency in comparison with the random k -labelsets (RA k EL) algorithm. Then, the CC k EL transforms each k -labelset into a multiclass issue. Finally, it uses a fast decision output compensation strategy to address class imbalance in the decoded multi-label decision space. We compared the performance of the proposed CC k EL algorithm with that of multiple popular multi-label imbalance learning algorithms on 10 benchmark multi-label datasets, and the results show its effectiveness and superiority.
mathematics
What problem does this paper attempt to address?