An extreme learning machine based virtual sample generation method with feature engineering for credit risk assessment with data scarcity

Lean Yu,Xiaoming Zhang,Hang Yin
DOI: https://doi.org/10.1016/j.eswa.2022.117363
IF: 8.5
2022-09-15
Expert Systems with Applications
Abstract:As a typical category of data scarcity, small sample often makes it difficult to build a reliable machine learning model in credit risk assessment, and thus many virtual sample generation (VSG) methods have been proposed for sample augmentation based on sample distribution. In particular, when small sample with low dimensionality exists in credit datasets simultaneously, it becomes more difficult to predict the customer credit status for credit institutions. In order to solve these issues, an extreme learning machine (ELM) based VSG methodology with feature engineering is proposed for credit risk assessment with data scarcity. In this methodology, ELM-based VSG methodology is first used to generate virtual samples and solve data instance scarcity (i.e., small sample) issue. Second, feature engineering is used to solve data attribute scarcity (i.e., low dimensionality) issue. Finally, various classifiers are used to predict the performance for generated virtual samples. For verification purpose, two public credit datasets are used to conduct credit classification with data scarcity. The experimental results show that the proposed methodology can effectively improve the classification performance for credit risk assessment with data scarcity.
computer science, artificial intelligence,engineering, electrical & electronic,operations research & management science
What problem does this paper attempt to address?