Abstract:A credit default is a negative event for individual borrowers, and lenders and at scale, can impact the broader economy. To protect against losses, lenders seek to maintain robust lending practices through optimising credit risk assessment. The credit risk modelling literature is extensive, yet many studies suffer from flaws stifling adoption by industry. Nonlinear algorithms are often incorporated for comparison purposes, nevertheless, a comprehensive explanation of variables and their treatment is not always provided. Furthermore, the publicly available data sets do not necessarily contain important financial variables and their limited sample size raises questions regarding their statistical power. Commonly the focus is on statistical performance as opposed to optimizing the commercial outcome motivating the use of credit scorecards. However, financial institutions are more interested in optimizing profits by reducing the cost arising from false assessments of credit applicants. In this research, we simulated important financial variables recommended by industry to append to German Credit Dataset to compare optimised credit risk assessment models. The performance of Logistic Regression is compared with Random Forest, Gradient Boosting Machine and Artificial Neural Network. A new financial performance metric is developed to estimate the costs arising from both false positives and false negatives, and compared with statistical criteria; Area under the Receiver Operating Curve, Accuracy and Somers' Delta. Four typical asset classes; Credit Cards, Small and Large Loans, and Mortgages are simulated to determine which algorithm outperforms in estimating costs. The results demonstrate that Artificial Neural Network outperforms other methods, and our approach ranks models differently from the purely statistical measures. The proposed cost metric would provide a significant financial benefit to organisations.

The Impact of Feature Selection and Transformation on Machine Learning Methods in Determining the Credit Scoring

Reinforcement of the Bank Loan Model using the Feature Selection Method of Machine Learning

Feature Selection in Credit Risk Modeling: an International Evidence

Machine Learning for Enhanced Credit Risk Assessment: An Empirical Approach

Algorithm Comparison for Data Mining Classification: Assessing Bank Customer Credit Scoring Default Risk

How Do Machine Learning and Non-Traditional Data Affect Credit Scoring? New Evidence from a Chinese Fintech Firm

Credit risk analysis using boosting methods

Application of Machine Learning in Credit Risk Scorecard

Credit card score prediction using machine learning models: A new dataset

A Comparative Assessment of Credit Risk Model Based on Machine Learning ——a case study of bank loan data

An Empirical Comparison of Machine-Learning Methods on Bank Client Credit Assessments

Cost-aware Credit-scoring Framework Based on Resampling and Feature Selection

Machine Learning for an Enhanced Credit Risk Analysis: A Comparative Study of Loan Approval Prediction Models Integrating Mental Health Data

A Framework for Credit Risk Prediction Using the Optimized-FKSVR Machine Learning Classifier

Optimal Credit Scorecard Model Selection Using Costs Arising from Both False Positives and False Negatives

Statistical and machine learning models in credit scoring: A systematic literature survey

Svm-Based Credit Rating and Feature Selection

Predicting Credit Risk for Unsecured Lending: A Machine Learning Approach

Measuring the model risk-adjusted performance of machine learning algorithms in credit default prediction

Big Data and Related Model Algorithms in Commercial Bank Credit Evaluation

Research on credit risk assessment optimization based on machine learning