Abstract:Predicting credit default risk is important to financial institutions, as accurately predicting the likelihood of a borrower defaulting on their loans will help to reduce financial losses, thereby maintaining profitability and stability. Although machine learning models have been used in assessing large applications with complex attributes for these predictions, there is still a need to identify the most effective techniques for the model development process, including the technique to address the issue of data imbalance. In this research, we conducted a comparative analysis of random forest, decision tree, SVMs (Support Vector Machines), XGBoost (Extreme Gradient Boosting), ADABoost (Adaptive Boosting) and the multi-layered perceptron, to predict credit defaults using loan data from LendingClub. Additionally, XGBoost was used as a framework for testing and evaluating various techniques. Moreover, we applied this XGBoost framework to handle the issue of class imbalance observed, by testing various resampling methods such as Random Over-Sampling (ROS), the Synthetic Minority Over-Sampling Technique (SMOTE), Adaptive Synthetic Sampling (ADASYN), Random Under-Sampling (RUS), and hybrid approaches like the SMOTE with Tomek Links and the SMOTE with Edited Nearest Neighbours (SMOTE + ENNs). The results showed that balanced datasets significantly outperformed the imbalanced dataset, with the SMOTE + ENNs delivering the best overall performance, achieving an accuracy of 90.49%, a precision of 94.61% and a recall of 92.02%. Furthermore, ensemble methods such as voting and stacking were employed to enhance performance further. Our proposed model achieved an accuracy of 93.7%, a precision of 95.6% and a recall of 95.5%, which shows the potential of ensemble methods in improving credit default predictions and can provide lending platforms with the tool to reduce default rates and financial losses. In conclusion, the findings from this study have broader implications for financial institutions, offering a robust approach to risk assessment beyond the LendingClub dataset.

Feature Selection and Sensitivity Analysis of Oversampling in Big and Highly Imbalanced Bank's Credit Data

Enhancing Supervised Model Performance in Credit Risk Classification Using Sampling Strategies and Feature Ranking

The Impact of Feature Selection and Transformation on Machine Learning Methods in Determining the Credit Scoring

Reinforcement of the Bank Loan Model using the Feature Selection Method of Machine Learning

Intelligent Model for Enhancing the Bankruptcy Prediction with Imbalanced Data Using Oversampling and CatBoost

Bank Loan Prediction Using Machine Learning Techniques

Hybrid Undersampling and Oversampling for Handling Imbalanced Credit Card Data

Credit risk assessment for unbalanced datasets based on data mining, artificial neural network and support vector machines

Resampling Techniques Study on Class Imbalance Problem in Credit Risk Prediction

The effect of feature extraction and data sampling on credit card fraud detection

Machine Learning for an Enhanced Credit Risk Analysis: A Comparative Study of Loan Approval Prediction Models Integrating Mental Health Data

Unbalanced Credit Card Fraud Detection Data: A Machine Learning-Oriented Comparative Study of Balancing Techniques

MACHINE LEARNING-BASED APPROACHES FOR CREDIT CARD DEBT PREDICTION

Empirical Analysis of Ensemble Learning for Imbalanced Credit Scoring Datasets: A Systematic Review

Predicting credit risk on the basis of financial and non-financial variables and data mining

Machine Learning Models Evaluation and Feature Importance Analysis on NPL Dataset

Performance Evaluation of Machine Learning Methods for Credit Card Fraud Detection Using SMOTE and AdaBoost

Multiple optimized ensemble learning for high-dimensional imbalanced credit scoring datasets

Advanced User Credit Risk Prediction Model using LightGBM, XGBoost and Tabnet with SMOTEENN

Ensemble-Based Machine Learning Algorithm for Loan Default Risk Prediction

The Credit Risk Problem—A Developing Country Case Study