Impact of resampling methods and classification models on the imbalanced credit scoring problems

Jin Xiao,Yadong Wang,Jing Chen,Ling Xie,Jing Huang
DOI: https://doi.org/10.1016/j.ins.2021.05.029
IF: 8.1
2021-08-01
Information Sciences
Abstract:For imbalanced credit scoring, the most common solution is to balance the class distribution of the training set with a resampling method, and then train a classification model and classify the customer samples in the test set. However, it is still difficult to select the most appropriate resampling methods and classification models, and the optimal combinations of them have not been identified. Therefore, this study proposes a new benchmark models comparison framework for imbalanced credit scoring. In the framework, we introduce the index of balanced accuracy and four other evaluation measures, experimentally compare the performance of 10 benchmark resampling methods and nine benchmark classification models respectively on six credit scoring data sets, and analyze the optimal combinations of them. The experimental result shows: (1) as for benchmark resampling methods, random under-sampling (a traditional resampling method) and synthetic minority over-sampling technique combined with Wilson's edited nearest neighbor (an intelligent resampling method) present the best performance; (2) as for benchmark classification models, logistic regression (a single classification model) and adaptive boosting (an ensemble classification model) present the best performance; (3) as for optimal combinations, random under-sampling combined with random subspace (an ensemble classification model) can obtain the most satisfactory credit scoring performance.
What problem does this paper attempt to address?