Abstract:Machine learning methods have gained widespread utilization in small and micro enterprise credit risk assessment. However, the practical application of these methods encounters a conundrum involving accuracy and interpretability. In this study, a multi-stage ensemble model is proposed to enhance the model’s interpretability. To strengthen predictive portraits, a multi-feature enhancement method is proposed to integrate non-financial behavioral information and soft information on credit rating into the annual loan ledger data, thereby bolstering the explanatory capacity of the features. To rectify the issue of data imbalance and avoid information loss, a new bagging-based oversampling method is proposed to oversample the minority class samples in multiple parallelized subsets divided by the bagging strategy. To unleash the performance potential of base classifiers, a new voting-weight optimization method is proposed to optimize the soft voting weights of the candidate base classifiers. The experiment results of an annual loan ledger dataset of a commercial bank in China (with an accuracy of 97.9%, an area under the curve of 0.97, a logistic loss of 0.07, a Brier score of 0.01, and a Kolmogorov-Smirnov statistic of 0.38) and the other five public datasets indicating excellent model fit. By focusing on the widespread soft information and data structures characteristic of SME loan risk assessment data, an additional SHAP model explanation method enhances interpretability. This method reveals that the enhanced ’debt-to-income ratio,’ along with non-financial behavioral information and features derived from soft information, are essential for predicting loan defaults. Such enhancements help to alleviate the issue of information asymmetry in SME loan risk assessment.

Credit Scoring Model for Fintech Lending: an Integration of Large Language Models and FocalPoly Loss

Empowering Many, Biasing a Few: Generalist Credit Scoring through Large Language Models

How Do Machine Learning and Non-Traditional Data Affect Credit Scoring? New Evidence from a Chinese Fintech Firm

An Integrated Machine Learning and Deep Learning Framework for Credit Card Approval Prediction

Credit Risk Meets Large Language Models: Building a Risk Indicator from Loan Descriptions in P2P Lending

Optimization of Personal Credit Evaluation Based on a Federated Deep Learning Model

A Novel Multi-Stage Ensemble Model With a Hybrid Genetic Algorithm for Credit Scoring on Imbalanced Data

A Vertical Federated Learning Method for Interpretable Scorecard and Its Application in Credit Scoring

CatMemo at the FinLLM Challenge Task: Fine-Tuning Large Language Models using Data Fusion in Financial Applications

A novel SSA-CatBoost machine learning model for credit rating

Feature Engineering for Credit Risk Evaluation in Online P2P Lending.

Feature Enhanced Ensemble Modeling with Voting Optimization for Credit Risk Assessment

FinLangNet: A Novel Deep Learning Framework for Credit Risk Prediction Using Linguistic Analogy in Financial Data

A novel multi-stage ensemble model with enhanced outlier adaptation for credit scoring

A federated interpretable scorecard and its application in credit scoring

Data-Centric Financial Large Language Models

SNFinLLM: Systematic and Nuanced Financial Domain Adaptation of Chinese Large Language Models

AttentionFM: Incorporating Attention Mechanism and Factorization Machine for Credit Scoring

Application Analysis of Credit Scoring of Financial Institutions Based on Machine Learning Model

Financial risk assessment to improve the accuracy of financial prediction in the internet financial industry using data analytics models

Explaining Credit Risk Scoring through Feature Contribution Alignment with Expert Risk Analysts