OptimizingEnsemble Learning to Reduce Misclassification Costs in Credit Risk Scorecards

John Martin,Sona Taheri,Mali Abdollahian

DOI: https://doi.org/10.3390/math12060855

IF: 2.4

2024-03-15

Mathematics

Abstract:Credit risk scorecard models are utilized by lending institutions to optimize decisions on credit approvals. In recent years, ensemble learning has often been deployed to reduce misclassification costs in credit risk scorecards. In this paper, we compared the risk estimation of 26 widely used machine learning algorithms based on commonly used statistical metrics. The best-performing algorithms were then used for model selection in ensemble learning. For the first time, we proposed financial criteria that assess the impact of losses associated with both false positive and false negative predictions to identify optimal ensemble learning. The German Credit Dataset (GCD) is augmented with simulated financial information according to a hypothetical mortgage portfolio observed in UK, European and Australian banks to enable the assessment of losses arising from misclassification costs. The experimental results using the simulated GCD show that the best predictive individual algorithm with the accuracy of 0.87, Gini of 0.88 and Area Under the Receiver Operating Curve of 0.94 was the Generalized Additive Model (GAM). The ensemble learning method with the lowest misclassification cost was the combination of Random Forest (RF) and K-Nearest Neighbors (KNN), totaling USD 417 million in costs (USD 230 for default costs and USD 187 for opportunity costs) compared to the costs of the GAM (USD 487, USD 287 and USD 200). Implementing the proposed financial criteria has led to a significant USD 70 million reduction in misclassification costs derived from a small sample. Thus, the lending institutions' profit would considerably rise as the number of submitted credit applications for approval increases.

mathematics

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to reduce misclassification costs in credit scoring cards. Specifically, the authors compared the risk estimates of 26 widely - used machine - learning algorithms and evaluated them based on commonly - used statistical indicators. They selected the best - performing algorithms for ensemble learning and for the first time proposed a financial standard for evaluating the impact of losses related to misclassification to identify the optimal ensemble - learning method. Through this method, the research aims to improve the quality of credit decisions, thereby reducing the costs arising from wrongly approving or rejecting customers and ultimately increasing the profitability of lending institutions. The paper uses the German Credit Dataset (GCD) and simulates financial variables according to the assumed mortgage portfolio sizes observed in UK, European and Australian banks in order to evaluate misclassification costs. The experimental results show that among the ensemble - learning methods, the method combining Random Forest (RF) and K - Nearest Neighbors (KNN) performs best in reducing misclassification costs.

OptimizingEnsemble Learning to Reduce Misclassification Costs in Credit Risk Scorecards

Optimal Credit Scorecard Model Selection Using Costs Arising from Both False Positives and False Negatives

Ensemble-Based Machine Learning Algorithm for Loan Default Risk Prediction

An implementation of ensemble methods, logistic regression, and neural network for default prediction in Peer-to-Peer lending

Empirical Evaluation of Ensemble Learning for Credit Scoring

Credit Scoring Models Using Ensemble Learning and Classification Approaches: A Comprehensive Survey

Feature Enhanced Ensemble Modeling with Voting Optimization for Credit Risk Assessment

Minimizing the Societal Cost of Credit Card Fraud with Limited and Imbalanced Data

Ensemble Methodology:Innovations in Credit Default Prediction Using LightGBM, XGBoost, and LocalEnsemble

Performance assessment of ensemble learning systems in financial data classification

Empirical Analysis of Ensemble Learning for Imbalanced Credit Scoring Datasets: A Systematic Review

Financial Fraud Detection Based on Ensemble Machine Learning

Machine Learning for an Enhanced Credit Risk Analysis: A Comparative Study of Loan Approval Prediction Models Integrating Mental Health Data

Optimal cost-sensitive credit scoring using a new hybrid performance metric

An Empirical Comparison of Machine-Learning Methods on Bank Client Credit Assessments

Accuracy Comparison between Five Machine Learning Algorithms for Financial Risk Evaluation

Ensemble Learning or Deep Learning? Application to Default Risk Analysis

An empirical study of classification algorithm evaluation for financial risk prediction

Multiple optimized ensemble learning for high-dimensional imbalanced credit scoring datasets

A novel hybrid credit scoring model based on ensemble feature selection and multilayer ensemble classification