Abstract:Machine learning and artificial intelligence have achieved a human-level performance in many application domains, including image classification, speech recognition and machine translation. However, in the financial domain expert-based credit risk models have still been dominating. Establishing meaningful benchmark and comparisons on machine-learning approaches and human expert-based models is a prerequisite in further introducing novel methods. Therefore, our main goal in this study is to establish a new benchmark using real consumer data and to provide machine-learning approaches that can serve as a baseline on this benchmark. We performed an extensive comparison between the machine-learning approaches and a human expert-based model—FICO credit scoring system—by using a Survey of Consumer Finances (SCF) data. As the SCF data is non-synthetic and consists of a large number of real variables, we applied two variable-selection methods: the first method used hypothesis tests, correlation and random forest-based feature importance measures and the second method was only a random forest-based new approach (NAP), to select the best representative features for effective modelling and to compare them. We then built regression models based on various machine-learning algorithms ranging from logistic regression and support vector machines to an ensemble of gradient boosted trees and deep neural networks. Our results demonstrated that if lending institutions in the 2001s had used their own credit scoring model constructed by machine-learning methods explored in this study, their expected credit losses would have been lower, and they would be more sustainable. In addition, the deep neural networks and XGBoost algorithms trained on the subset selected by NAP achieve the highest area under the curve (AUC) and accuracy, respectively.

Calibration of Machine Learning Classifiers for Probability of Default Modelling

From Uncertainty to Precision: Enhancing Binary Classifier Performance through Calibration

Approaches for credit scorecard calibration: An empirical analysis

Probabilistic Scores of Classifiers, Calibration is not Enough

Calibration of the rating transition model for high and low default portfolios

Measuring the model risk-adjusted performance of machine learning algorithms in credit default prediction

The Impact of Feature Selection and Transformation on Machine Learning Methods in Determining the Credit Scoring

Accuracy Comparison between Five Machine Learning Algorithms for Financial Risk Evaluation

Decoupling Decision-Making in Fraud Prevention through Classifier Calibration for Business Logic Action

Evaluation metrics and dimensional reduction for binary classification algorithms: a case study on bankruptcy prediction

How Do Machine Learning and Non-Traditional Data Affect Credit Scoring? New Evidence from a Chinese Fintech Firm

Prediction of default probability by using statistical models for rare events

Probability of default for lifetime credit loss for IFRS 9 using machine learning competing risks survival analysis models

Predicting credit default probabilities using machine learning techniques in the face of unequal class distributions

An implementation of ensemble methods, logistic regression, and neural network for default prediction in Peer-to-Peer lending

Classifier Calibration: A survey on how to assess and improve predicted class probabilities

Evaluating Posterior Probabilities: Decision Theory, Proper Scoring Rules, and Calibration

Calibration methods in imbalanced binary classification

Machine learning techniques in joint default assessment

Application of Machine Learning in Credit Risk Scorecard

An Empirical Comparison of Machine-Learning Methods on Bank Client Credit Assessments