Abstract:Credit scoring models, which are among the most potent risk management tools that banks and financial institutes rely on, have been a popular subject for research in the past few decades. Accordingly, many approaches have been developed to address the challenges in classifying loan applicants and improve and facilitate decision-making. The imbalanced nature of credit scoring datasets, as well as the heterogeneous nature of features in credit scoring datasets, pose difficulties in developing and implementing effective credit scoring models, targeting the generalization power of classification models on unseen data. In this paper, we propose the Bagging Supervised Autoencoder Classifier (BSAC) that mainly leverages the superior performance of the Supervised Autoencoder, which learns low-dimensional embeddings of the input data exclusively with regards to the ultimate classification task of credit scoring, based on the principles of multi-task learning. BSAC also addresses the data imbalance problem by employing a variant of the Bagging process based on the undersampling of the majority class. The obtained results from our experiments on the benchmark and real-life credit scoring datasets illustrate the robustness and effectiveness of the Bagging Supervised Autoencoder Classifier in the classification of loan applicants that can be regarded as a positive development in credit scoring models.

What problem does this paper attempt to address?

This paper attempts to solve two main problems in credit scoring: 1. **Data imbalance problem**: Credit - scoring datasets usually have a severe class - imbalance problem, that is, most samples belong to customers who can repay on time (negative class), while a small number of samples belong to default customers (positive class). This imbalance may cause the classification model to be biased towards the majority class during the training process, thus affecting the model's ability to identify the minority class (default customers). 2. **Feature representation problem**: Credit - scoring datasets usually contain structured, semi - structured and unstructured data, which describe different information of each loan applicant (such as demographic information, financial situation and behavioral characteristics). In order to develop robust credit - scoring classification techniques, it is necessary to embed these complex information into low - dimensional representations to improve the generalization ability of the model. To solve these problems, the paper proposes a new method - **Bagging Supervised Autoencoder Classifier (BSAC)**. Specifically: - **Supervised Autoencoder**: Through the principle of multi - task learning, the supervised autoencoder can focus on the final classification task when learning low - dimensional embeddings. This method can not only extract the latent patterns of the input data, but also improve the classification performance. - **Bagging and Undersampling**: To deal with the data imbalance problem, BSAC uses a Bagging process based on undersampling. By reducing the number of majority - class samples, the model pays more attention to minority - class samples during the training process, thus improving the ability to identify default customers. In conclusion, the main contribution of this paper lies in combining the advantages of representation learning and ensemble learning, and proposing an effective credit - scoring classification model that can achieve better classification results in imbalanced datasets.

Bagging Supervised Autoencoder Classifier for Credit Scoring

BACS: blockchain and AutoML-based technology for efficient credit scoring classification

Dynamic Ensemble Learning for Credit Scoring: A Comparative Study

Empirical Evaluation of Ensemble Learning for Credit Scoring

An Online Transfer Learning Framework With Extreme Learning Machine for Automated Credit Scoring

A ResNet-LSTM Based Credit Scoring Approach for Imbalanced Data

Multiple optimized ensemble learning for high-dimensional imbalanced credit scoring datasets

A DBN-based resampling SVM ensemble learning paradigm for credit classification with imbalanced data

A deep learning model for behavioural credit scoring in banks

An Integrated Machine Learning and Deep Learning Framework for Credit Card Approval Prediction

A New Hybrid Credit Scoring Ensemble Model with Feature Enhancement and Soft Voting Weight Optimization.

Intelligent credit scoring using deep learning methods

A Novel Multi-Stage Ensemble Model With a Hybrid Genetic Algorithm for Credit Scoring on Imbalanced Data

A multi-level classification based ensemble and feature extractor for credit risk assessment

Empirical Analysis of Ensemble Learning for Imbalanced Credit Scoring Datasets: A Systematic Review

Predicting Credit Risk for Unsecured Lending: A Machine Learning Approach

Credit Scoring Models Using Ensemble Learning and Classification Approaches: A Comprehensive Survey

Enhancing credit risk prediction with hybrid deep learning and sand cat swarm feature selection

A novel SSA-CatBoost machine learning model for credit rating

Feature Enhanced Ensemble Modeling with Voting Optimization for Credit Risk Assessment

Credit card score prediction using machine learning models: A new dataset