Resampling ensemble model based on data distribution for imbalanced credit risk evaluation in P2P lending

Kun Niu,Zaimei Zhang,Yan Liu,Renfa Li
DOI: https://doi.org/10.1016/j.ins.2020.05.040
IF: 8.1
2020-10-01
Information Sciences
Abstract:<p>The misclassification of loan applicants by credit scoring model is one of the main factors causing the loss of investors' profits in P2P lending. Class imbalance of credit data is a main factor that affects classification performance of the model. Most existing methods of addressing class imbalance in credit scoring worked on improving the prediction accuracy for minority class samples (bad credit), which usually led to decreasing the prediction performance for majority class samples (good credit) significantly. In this paper, we propose a novel resampling ensemble model based on data distribution (REMDD) for imbalanced credit risk evaluation in P2P lending. REMMD solves class imbalance problem by using proposed undersampling method based on majority class data distribution (UMCDD). To further improve classification performance of REMMD, base classifiers with better comprehensive performance on the validation set are used to conduct class prediction. We validate the classification performance of REMDD on the three real and representative P2P lending credit datasets. The experimental results demonstrate that REMDD not only has good prediction performance for both majority class and minority class, but also effectively improves the comprehensive classification performance for imbalanced credit risk evaluation in P2P lending, compared with existing models.</p>
computer science, information systems
What problem does this paper attempt to address?