Cost-sensitive Semi-Supervised Selective Ensemble Model for Customer Credit Scoring.

Jin Xiao,Xu Zhou,Yu Zhong,Ling Xie,Xin Gu,Dunhu Liu
DOI: https://doi.org/10.1016/j.knosys.2019.105118
IF: 8.139
2019-01-01
Knowledge-Based Systems
Abstract:Only a few customers can be labeled in realistic credit-scoring problems, while many other customers cannot. Further, satisfactory performance is difficult, as traditional supervised learning methods can only use labeled samples to build credit-scoring models. Semi-supervised learning (SSL) can use both labeled and unlabeled samples to solve this problem, but existing credit-scoring research has primarily constructed single semi-supervised models. This study introduces SSL, cost-sensitive learning, a group method of data handling (GMDH), and an ensemble learning technique to propose a GMDH-based cost-sensitive semi-supervised selective ensemble (GCSSE) model. This involves two stages: (1)First, train an ensemble model composed of N base classifiers on the initial training set L with class labels, use it to selectively label the samples from the dataset U without class labels, add them with their predicted labels to the training set, and update the N base classifiers on the new training set; (2)Second, classify L and the test set using the respective trained base classifiers, and construct a cost-sensitive GMDH neural network to obtain the selective ensemble classification results for the test set. Experimental comparisons of five public customer credit score datasets and an empirical analysis of a real customer credit score dataset suggest that this model exhibits the best overall credit-scoring performance compared with one supervised ensemble model and three semi-supervised ensemble models.
What problem does this paper attempt to address?