Dynamic self-paced sampling ensemble for highly imbalanced and class-overlapped data classification

Fang Zhou,Suting Gao,Lyu Ni,Martin Pavlovski,Qiwen Dong,Zoran Obradovic,Weining Qian
DOI: https://doi.org/10.1007/s10618-022-00838-z
IF: 5.406
2022-06-19
Data Mining and Knowledge Discovery
Abstract:Datasets with imbalanced class distribution are available in various real-world applications. A great number of approaches has been proposed to address the class imbalance challenge, but most of these models perform poorly when datasets are characterized with high class imbalance, class overlap and low data quality. In this study, we propose an effective meta-framework for high imbalance overlapped classification, called DAPS ( DynAmic self-Paced sampling enSemble ), which (1) leverages reasonable and effective sampling to maximize the utilization of informative instances and to avoid serious information loss and (2) assigns proper instance weights to address the issues of noisy data. Furthermore, most of the existing canonical classifiers (e.g. Decision Tree, Random Forest) can be integrated in DAPS . The comprehensive experimental results on both synthetic and three real-world datasets show that the DAPS model could obtain considerable improvements in F1-score when compared to a broad range of published models.
computer science, information systems, artificial intelligence
What problem does this paper attempt to address?