Abstract:Machine learning methods have gained widespread utilization in small and micro enterprise credit risk assessment. However, the practical application of these methods encounters a conundrum involving accuracy and interpretability. In this study, a multi-stage ensemble model is proposed to enhance the model’s interpretability. To strengthen predictive portraits, a multi-feature enhancement method is proposed to integrate non-financial behavioral information and soft information on credit rating into the annual loan ledger data, thereby bolstering the explanatory capacity of the features. To rectify the issue of data imbalance and avoid information loss, a new bagging-based oversampling method is proposed to oversample the minority class samples in multiple parallelized subsets divided by the bagging strategy. To unleash the performance potential of base classifiers, a new voting-weight optimization method is proposed to optimize the soft voting weights of the candidate base classifiers. The experiment results of an annual loan ledger dataset of a commercial bank in China (with an accuracy of 97.9%, an area under the curve of 0.97, a logistic loss of 0.07, a Brier score of 0.01, and a Kolmogorov-Smirnov statistic of 0.38) and the other five public datasets indicating excellent model fit. By focusing on the widespread soft information and data structures characteristic of SME loan risk assessment data, an additional SHAP model explanation method enhances interpretability. This method reveals that the enhanced ’debt-to-income ratio,’ along with non-financial behavioral information and features derived from soft information, are essential for predicting loan defaults. Such enhancements help to alleviate the issue of information asymmetry in SME loan risk assessment.

Can machine learning paradigm improve attribute noise problem in credit risk classification?

Novel hybrid ensemble credit scoring model with stacking-based noise detection and weight assignment

How Do Machine Learning and Non-Traditional Data Affect Credit Scoring? New Evidence from a Chinese Fintech Firm

Data Driven Credit Risk Management Process: a Machine Learning Approach

An Integrated Machine Learning and Deep Learning Framework for Credit Card Approval Prediction

Research on credit risk assessment optimization based on machine learning

Application of Machine Learning in Credit Risk Scorecard

Feature Enhanced Ensemble Modeling with Voting Optimization for Credit Risk Assessment

Machine learning techniques for credit risk evaluation: a systematic literature review

Monotonic Neural Additive Models: Pursuing Regulated Machine Learning Models for Credit Scoring

Supply chain finance credit risk assessment using support vector machine–based ensemble improved with noise elimination

Reinforcement of the Bank Loan Model using the Feature Selection Method of Machine Learning

Development and Implementation of a Multilayer Deep Learning-Based Bank Credit Risk Forecasting System

Prediction of bank credit worthiness through credit risk analysis: an explainable machine learning study

Financial Risk Management using Machine Learning Method

Machine learning-driven credit risk: a systemic review

A Novel Multi-Stage Ensemble Model With a Hybrid Genetic Algorithm for Credit Scoring on Imbalanced Data

An extreme learning machine based virtual sample generation method with feature engineering for credit risk assessment with data scarcity

A novel multi-stage ensemble model with enhanced outlier adaptation for credit scoring

Enhancing credit risk prediction with hybrid deep learning and sand cat swarm feature selection

Harnessing Machine Learning Emerging Technology in Financial Investment Industry: Machine Learning Credit Rating Model Implementation