Empirical Analysis of Ensemble Learning for Imbalanced Credit Scoring Datasets: A Systematic Review

Sudhansu R. Lenka,Sukant Kishoro Bisoy,Rojalina Priyadarshini,Mangal Sain
DOI: https://doi.org/10.1155/2022/6584352
2022-06-16
Wireless Communications and Mobile Computing
Abstract:Credit scoring analysis has gained tremendous importance for researchers and the financial industries around the globe. It helps the financial industries to grant credits or loans to each deserving applicant with zero or minimal risks. However, developing an accurate and effective credit scoring model is a challenging task due to class imbalance and the presence of some irrelevant features. Recent researches show that ensemble learning has achieved supremacy in this field. In this paper, we performed an extensive comparative analysis of ensemble algorithms to bring further improvements in the algorithm oversampling, and feature selection (FS) techniques are implemented. The relevant features are identified by utilizing three FS techniques, such as information gain (IG), principal component analysis (PCA), and genetic algorithm (GA). Additionally, a comparative performance analysis is performed using 5 base and 14 ensemble models on three credit scoring datasets. The experimental results exhibit that the GA-based FS technique and CatBoost algorithm perform significantly better than other models in terms of five metrics, i.e., accuracy (ACC), area under the curve (AUC), F1-score, Brier score (BS), and Kolmogorov-Smirnov (KS).
computer science, information systems,telecommunications,engineering, electrical & electronic
What problem does this paper attempt to address?