Can machine learning paradigm improve attribute noise problem in credit risk classification?

Lean Yu,Xiaowen Huang,Hang Yin
DOI: https://doi.org/10.1016/j.iref.2020.08.016
2020-11-01
Abstract:<p>In this paper, a dual-voting-based learning paradigm is proposed to solve attribute noise problem in credit risk classification. In the proposed learning paradigm, three stages are involved. In the first stage, four indexes are introduced to evaluate the noise level of attributes. In the second stage, attributes with different noise levels are divided into different attribute sets in accordance with the dual-voting results of noise level. In the third stage, credit datasets with different attributes sets are dealt with different learning strategies and different de-noising methods for comparison purpose. In the proposed learning paradigm, a classification and regression tree (CART) model is adopted as the generic classifier to evaluate the performance on training datasets generated by different learning strategies and noise reduction methods. In addition, the performance of all learning strategies on sparse data with attribute noise is also discussed. Experimental results show that the proposed learning paradigm performs better than the benchmarks to solve the attribute noise problem not only in accuracy and its stability, but also in speediness. Further analysis indicates that the sparse data with attribute noise can further improve the stability of accuracy for a specific de-noising method. This implies that the proposed dual voting-based learning paradigm is a promising solution to attribute noise reduction in credit risk classification.</p>
economics,business, finance
What problem does this paper attempt to address?