Bankruptcy prediction using optimal ensemble models under balanced and imbalanced data

Bahareh Amirshahi,Salim Lahmiri
DOI: https://doi.org/10.1111/exsy.13599
IF: 3.3
2024-03-31
Expert Systems
Abstract:This study explores the performance of gradient boosting methods in bankruptcy prediction for a highly imbalanced dataset. We developed different heterogenous ensemble models based on three popular gradient boosting methods—XGBoost, LightGBM, and CatBoost. Our ensemble models were optimized using the cross‐validation method and the results of the hold‐out test sets showed that the optimized ensemble models not only outperform their base learners, but also improve the state‐of‐the‐art benchmark results on the same dataset. Interestingly, we observed that the data oversampling technique that is commonly used to address the class imbalance issue had an adverse impact on our ensemble models' performance. This indicates that our models are robust to the imbalanced dataset problem that typically degrades the classification performance of machine learning models.
computer science, artificial intelligence, theory & methods
What problem does this paper attempt to address?