A two-stage case-based reasoning driven classification paradigm for financial distress prediction with missing and imbalanced data

Lean Yu,Mengxin Li,Xiaojun Liu
DOI: https://doi.org/10.1016/j.eswa.2024.123745
IF: 8.5
2024-03-31
Expert Systems with Applications
Abstract:Financial distress prediction often accompanies missing sample feature data and imbalanced normal and abnormal samples. To solve missing and imbalanced data that have significant negative impacts on the financial distress prediction model, a two-stage case-based reasoning (CBR)-driven classification paradigm is proposed to accurately and robustly predict financial distress. The proposed classification paradigm involves two main stages: CBR-driven missing data imputation and learning vector quantization (LVQ)-CBR-driven classifier prediction. In the first stage, the hybrid CBR-driven weighted imputation method is used to fill in missing values in the analytical dataset to obtain reliable and stable imputation performance, thereby solving the data missing problem. In the second stage, the LVQ-CBR-driven classification model is constructed to predict financial distress. By highlighting and fully learning minority abnormal samples, the classification model solves the low prediction accuracy of minority abnormal samples arising from data imbalance. For illustration and verification, some experiments are performed on seven Chinese-listed enterprise datasets with different missing and imbalance rates. Corresponding results show that the proposed two-stage CBR-driven classification paradigm can achieve the best imputation performance, greatly improve the prediction accuracy of minority abnormal samples, and integrally realize the best overall prediction performance compared with other imputation methods, imbalanced data processing methods, and their combinations. This implies that the proposed two-stage CBR-driven classification paradigm can be used as a competitive solution to financial distress prediction with missing and imbalanced data.
computer science, artificial intelligence,engineering, electrical & electronic,operations research & management science
What problem does this paper attempt to address?