Multiple optimized ensemble learning for high-dimensional imbalanced credit scoring datasets

Sudhansu R. Lenka,Sukant Kishoro Bisoy,Rojalina Priyadarshini
DOI: https://doi.org/10.1007/s10115-024-02129-z
IF: 2.7
2024-05-24
Knowledge and Information Systems
Abstract:Credit scoring models are crucial tools for lenders to assess credit risks. Researchers from academia and the financial industry have shown intense interest in these models. However, real credit datasets often have high dimensionality and class imbalance, making it challenging to develop accurate and effective credit scoring models. To address these challenges, a new approach called the Multiple-Optimized Ensemble Learning (MOEL) method has been proposed. In MOEL, a technique called Multiple Diverse Optimized Subsets (MDOS) generates multiple diverse optimized subsets from various weighted random forests. From each subset, more effective and relevant features are selected. Then, a new evaluation measure is applied to each subset to determine the more optimized subsets. These subsets are applied to a novel Mahalanobis-based oversampling (MOS) technique to provide balanced subsets for the base classifier, which lessens the detrimental effects of imbalanced datasets. Finally, a stacking-based ensemble method is applied to the balanced subsets for integration of the base models. The proposed model was evaluated against six high-dimensional imbalanced credit scoring datasets, and it outperformed state-of-the-art methods, exhibiting a mean rank of 1.5 and 1.333 in terms of F1_score and G-mean, respectively.
computer science, information systems, artificial intelligence
What problem does this paper attempt to address?