Abstract:Personal credit scoring is a challenging issue. In recent years, research has shown that machine learning has satisfactory performance in credit scoring. Because of the advantages of feature combination and feature selection, decision trees can match credit data which have high dimension and a complex correlation. Decision trees tend to overfitting yet. eXtreme Gradient Boosting is an advanced gradient enhanced tree that overcomes its shortcomings by integrating tree models. The structure of the model is determined by hyperparameters, which is aimed at the time-consuming and laborious problem of manual tuning, and the optimization method is employed for tuning. As particle swarm optimization describes the particle state and its motion law as continuous real numbers, the hyperparameter applicable to eXtreme Gradient Boosting can find its optimal value in the continuous search space. However, classical particle swarm optimization tends to fall into local optima. To solve this problem, this paper proposes an eXtreme Gradient Boosting credit scoring model that is based on adaptive particle swarm optimization. The swarm split, which is based on the clustering idea and two kinds of learning strategies, is employed to guide the particles to improve the diversity of the subswarms, in order to prevent the algorithm from falling into a local optimum. In the experiment, several traditional machine learning algorithms and popular ensemble learning classifiers, as well as four hyperparameter optimization methods (grid search, random search, tree-structured Parzen estimator, and particle swarm optimization), are considered for comparison. Experiments were performed with four credit datasets and seven KEEL benchmark datasets over five popular evaluation measures: accuracy, error rate (type I error and type II error), Brier score, and F 1 score. Results demonstrate that the proposed model outperforms other models on average. Moreover, adaptive particle swarm optimization performs better than the other hyperparameter optimization strategies.

Impact of resampling methods and classification models on the imbalanced credit scoring problems

Imbalanced Data Sets Classification Method Based on Over-Sampling Technique

Resampling Techniques Study on Class Imbalance Problem in Credit Risk Prediction

A Novel Multi-Stage Ensemble Model With a Hybrid Genetic Algorithm for Credit Scoring on Imbalanced Data

Classification of Imbalanced Credit scoring data sets Based on Ensemble Method with the Weighted-Hybrid-Sampling

Resampling ensemble model based on data distribution for imbalanced credit risk evaluation in P2P lending

A comparative study on machine learning models combining with outlier detection and balanced sampling methods for credit scoring

A ResNet-LSTM Based Credit Scoring Approach for Imbalanced Data

Fighting Sampling Bias: A Framework for Training and Evaluating Credit Scoring Models

XGBoost Optimized by Adaptive Particle Swarm Optimization for Credit Scoring

Value-Aware Resampling and Loss for Imbalanced Classification

A DBN-based resampling SVM ensemble learning paradigm for credit classification with imbalanced data

Application of Big Data Unbalanced Classification Algorithm in Credit Risk Analysis of Insurance Companies

Evaluating resampling methods on a real-life highly imbalanced online credit card payments dataset

Empirical Analysis of Ensemble Learning for Imbalanced Credit Scoring Datasets: A Systematic Review

Selecting the Suitable Resampling Strategy for Imbalanced Data Classification Regarding Dataset Properties. An Approach Based on Association Models

An Empirical Study on the Joint Impact of Feature Selection and Data Resampling on Imbalance Classification

Resampling approach for imbalanced data classification based on class instance density per feature value intervals

A New Hybrid Credit Scoring Ensemble Model with Feature Enhancement and Soft Voting Weight Optimization.

An empirical evaluation of sampling methods for the classification of imbalanced data

Interpretable machine learning for imbalanced credit scoring datasets