A systematic review for class-imbalance in semi-supervised learning
Willian Dihanster Gomes de Oliveira,Lilian Berton
DOI: https://doi.org/10.1007/s10462-023-10579-0
IF: 9.588
2023-09-04
Artificial Intelligence Review
Abstract:This review aims to examine the state of the art of semi-supervised learning (SSL) techniques for addressing class imbalanced data. Class imbalance is inherent in many real-world applications and has been extensively investigated in supervised classification. In a semi-supervised scenario, this problem is even more interesting because of two possible situations: performance is affected and the error is propagated to the unlabeled data, worsening the final performance, or unlabeled data can help to represent the minority class and improve the results. However, as far as we know, no survey exists organizing the semi-supervised approaches to deal with class imbalance. Our goal is to fill this gap and present a systematic review, where we retrieved 444 articles from five years (2017–2021) from ACM Digital Library, IEEE Explore, Elsevier, Springer, and Google Scholar. After applying exclusion criteria, 47 articles were selected and presented in more detail. We collect important information to answer four research questions, such as the existence of pre/post-processing techniques, the applications, data sets explored, the metrics used to evaluate the approaches, and the developed techniques to deal with class imbalance. We propose eight categories (balancing, graph-based, loss, self-training, ensemble, active learning, post-processing, and other types of learning) to organize the different methodological approaches from the papers. Finally, we present some discussion and future trends in the area. Our review aims to provide an understanding of the most prominent and currently relevant work employing SSL for class imbalance.
computer science, artificial intelligence