OptimizingEnsemble Learning to Reduce Misclassification Costs in Credit Risk Scorecards

John Martin,Sona Taheri,Mali Abdollahian
DOI: https://doi.org/10.3390/math12060855
IF: 2.4
2024-03-15
Mathematics
Abstract:Credit risk scorecard models are utilized by lending institutions to optimize decisions on credit approvals. In recent years, ensemble learning has often been deployed to reduce misclassification costs in credit risk scorecards. In this paper, we compared the risk estimation of 26 widely used machine learning algorithms based on commonly used statistical metrics. The best-performing algorithms were then used for model selection in ensemble learning. For the first time, we proposed financial criteria that assess the impact of losses associated with both false positive and false negative predictions to identify optimal ensemble learning. The German Credit Dataset (GCD) is augmented with simulated financial information according to a hypothetical mortgage portfolio observed in UK, European and Australian banks to enable the assessment of losses arising from misclassification costs. The experimental results using the simulated GCD show that the best predictive individual algorithm with the accuracy of 0.87, Gini of 0.88 and Area Under the Receiver Operating Curve of 0.94 was the Generalized Additive Model (GAM). The ensemble learning method with the lowest misclassification cost was the combination of Random Forest (RF) and K-Nearest Neighbors (KNN), totaling USD 417 million in costs (USD 230 for default costs and USD 187 for opportunity costs) compared to the costs of the GAM (USD 487, USD 287 and USD 200). Implementing the proposed financial criteria has led to a significant USD 70 million reduction in misclassification costs derived from a small sample. Thus, the lending institutions' profit would considerably rise as the number of submitted credit applications for approval increases.
mathematics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to reduce misclassification costs in credit scoring cards. Specifically, the authors compared the risk estimates of 26 widely - used machine - learning algorithms and evaluated them based on commonly - used statistical indicators. They selected the best - performing algorithms for ensemble learning and for the first time proposed a financial standard for evaluating the impact of losses related to misclassification to identify the optimal ensemble - learning method. Through this method, the research aims to improve the quality of credit decisions, thereby reducing the costs arising from wrongly approving or rejecting customers and ultimately increasing the profitability of lending institutions. The paper uses the German Credit Dataset (GCD) and simulates financial variables according to the assumed mortgage portfolio sizes observed in UK, European and Australian banks in order to evaluate misclassification costs. The experimental results show that among the ensemble - learning methods, the method combining Random Forest (RF) and K - Nearest Neighbors (KNN) performs best in reducing misclassification costs.