CUS-heterogeneous ensemble-based financial distress prediction for imbalanced dataset with ensemble feature selection

Xudong Du,Wei Li,Sumei Ruan,Li Li
DOI: https://doi.org/10.1016/j.asoc.2020.106758
IF: 8.7
2020-12-01
Applied Soft Computing
Abstract:<p>Due to the global financial crisis occurred in 2008, with a large amount of companies troubling in financial distress, the machine learning-based prediction of this dilemma has shown economic stakeholders' great practicability. In the field of machine learning, most of the previous studies only focus on the improvement of the imbalanced datasets sampling methods or the introduction of multiple classifiers in the constructing stage for prediction model. In view of this, this paper attempts to improve the scope and depth of ensemble to achieve better prediction performance for a severely imbalanced dataset of financial data of Chinese listed companies. For the first time, this paper combines the clustering-based under-sampling (CUS) with the gradient boosting decision tree (GBDT) to construct the model, which is used along with the current widely used extreme gradient boosting (XGBoost) as heterogeneous classifier to build heterogeneous ensemble in financial distress prediction. In addition, based on the idea of ensemble, this paper uses five feature selection methods based on different theoretical backgrounds to select features, and introduces ensemble from the whole process of feature selection, data preprocessing and model construction. In the comparative experience, the method proposed by us achieves the best performance on the test set. This study demonstrates the broad application of CUS for financial data processing and the superior generalization performance of the ensemble model relative to individual learners.</p>
computer science, artificial intelligence, interdisciplinary applications
What problem does this paper attempt to address?