Integrating data augmentation and hybrid feature selection for small sample credit risk assessment with high dimensionality

Xiaoming Zhang,Lean Yu,Hang Yin,Kin Keung Lai
DOI: https://doi.org/10.1016/j.cor.2022.105937
2022-10-01
Abstract:Data scarcity is a serious issue in credit risk assessment for some emerging financial institutions. As a typical category of data scarcity, small sample with high dimensionality often leads to the failure to build an effective credit risk assessment model. To solve this issue, a Wasserstein generative adversarial networks (WGAN)-based data augmentation and hybrid feature selection method is proposed for small sample credit risk assessment with high dimensionality. In this methodology, WGAN is first used to produce the virtual samples to overcome the data instance scarcity issue, and then a kernel partial least square with quantum particle swarm optimization (KPLS-QPSO) algorithm is proposed to solve the high-dimensionality issue. For verification purposes, two small sample credit datasets with high dimensionality are used to demonstrate the effectiveness of the proposed methodology. Empirical results indicate that the proposed methodology can significantly improve the prediction performance and avoid possible economic losses in credit risk assessment. This implies that the proposed methodology is a competitive approach to small sample credit risk assessment with high dimensionality.
computer science, interdisciplinary applications,engineering, industrial,operations research & management science
What problem does this paper attempt to address?