Classification performance assessment for imbalanced multiclass data

Jesús S. Aguilar-Ruiz,Marcin Michalak
DOI: https://doi.org/10.1038/s41598-024-61365-z
IF: 4.6
2024-05-11
Scientific Reports
Abstract:The evaluation of diagnostic systems is pivotal for ensuring the deployment of high-quality solutions, especially given the pronounced context-sensitivity of certain systems, particularly in fields such as biomedicine. Of notable importance are predictive models where the target variable can encompass multiple values (multiclass), especially when these classes exhibit substantial frequency disparities (imbalance). In this study, we introduce the Imbalanced Multiclass Classification Performance (IMCP) curve, specifically designed for multiclass datasets (unlike the ROC curve), and characterized by its resilience to class distribution variations (in contrast to accuracy or F -score). Moreover, the IMCP curve facilitates individual performance assessment for each class within the diagnostic system, shedding light on the confidence associated with each prediction—an aspect of particular significance in medical diagnosis. Empirical experiments conducted with real-world data in a multiclass context (involving 35 types of tumors) featuring a high level of imbalance demonstrate that both the IMCP curve and the area under the IMCP curve serve as excellent indicators of classification quality.
multidisciplinary sciences
What problem does this paper attempt to address?
This paper focuses on evaluating the performance of diagnostic systems in handling imbalanced multi-class classification data. In sensitive fields such as medical diagnosis, the accuracy of prediction models is crucial, especially when there is a significant difference in the number of samples among different classes (imbalanced data). The authors propose a new evaluation metric called the Imbalanced Multi-Class Performance (IMCP) curve, which is specifically designed for multi-class datasets and robust to changes in class distribution, unlike accuracy or F β-score. The IMCP curve is able to evaluate the performance of each class individually and reveal the confidence of each prediction, which is particularly meaningful for medical diagnosis. The paper points out that traditional metrics such as accuracy, precision, and F β-score are easily influenced by data distribution (i.e., class imbalance). To alleviate the imbalance, researchers often use undersampling or oversampling techniques. However, these techniques only affect the training phase, and the imbalance problem still exists in the testing phase, so there is a need for evaluation metrics that can handle imbalance. In the paper, the authors demonstrate the effectiveness of the IMCP curve and the area under the IMCP curve (AU(IMCP)) as quality metrics for classification, especially in handling highly imbalanced real-world multi-class tumor data. The experimental results show that the IMCP curve and AU(IMCP) can better reflect the performance of classifiers, while traditional methods such as accuracy may misleadingly improve on imbalanced data. In conclusion, this paper attempts to address how to provide a performance evaluation method that is not affected by class distribution in imbalanced multi-class classification tasks, in order to accurately measure the quality of prediction models, particularly in applications such as medical diagnosis.