Gradient Boosting Survival Tree with Applications in Credit Scoring

Miaojun Bai,Yan Zheng,Yun Shen
DOI: https://doi.org/10.1080/01605682.2021.1919035
2021-04-12
Abstract:Credit scoring plays a vital role in the field of consumer finance. Survival analysis provides an advanced solution to the credit-scoring problem by quantifying the probability of survival time. In order to deal with highly heterogeneous industrial data collected in Chinese market of consumer finance, we propose a nonparametric ensemble tree model called gradient boosting survival tree (GBST) that extends the survival tree models with a gradient boosting algorithm. The survival tree ensemble is learned by minimizing the negative log-likelihood in an additive manner. The proposed model optimizes the survival probability simultaneously for each time period, which can reduce the overall error significantly. Finally, as a test of the applicability, we apply the GBST model to quantify the credit risk with large-scale real market datasets. The results show that the GBST model outperforms the existing survival models measured by the concordance index (C-index), Kolmogorov-Smirnov (KS) index, as well as by the area under the receiver operating characteristic curve (AUC) of each time period.
Machine Learning
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is: how to more accurately quantify credit risk in the highly heterogeneous data of China's consumer finance market. Specifically, the author proposes a new method based on Gradient Boosting Survival Tree (GBST) to improve the credit scoring model. ### Problem Background 1. **Importance of Credit Scoring** - Credit scoring plays a crucial role in the consumer finance field, especially in the context of the rapidly developing Internet finance and e - commerce. - However, China's credit infrastructure is relatively weak, and only about 1/3 of consumers have credit ratings, which poses a huge challenge to credit providers. 2. **Limitations of Existing Methods** - Traditional credit scoring methods mainly rely on classification models (such as logistic regression, decision trees, etc.). Although these methods are effective, they cannot handle time - to - event data. - Survival Analysis is a method that can handle time - to - event data and can predict the probability of default and its occurrence time. However, when applied to credit scoring, it still faces the problems of data heterogeneity and high - dimensionality. ### Proposed Solution To address the above problems, the author proposes the GBST model, which combines the advantages of survival trees and the gradient boosting algorithm: 1. **Model Features** - **Non - parametric Ensemble Tree Model**: GBST learns the survival tree ensemble in an additive way by minimizing the negative log - likelihood function, thereby optimizing the survival probability in each time period. - **Adapt to High - Dimensional Heterogeneous Data**: GBST is especially suitable for high - dimensional, noisy, sparse, and highly heterogeneous data in China's consumer finance market. 2. **Optimization Objectives** - GBST optimizes model parameters by minimizing the negative log - likelihood loss function to ensure that the overall error of the model is minimized in all time periods. 3. **Experimental Verification** - The author conducted experiments using a large - scale real - market data set. The results show that the GBST model outperforms existing survival models in multiple evaluation metrics (such as the Concordance Index C - index, Kolmogorov - Smirnov Index KS, and the Area Under the ROC Curve AUC in each time period). ### Summary The core of this paper is to develop a new Gradient Boosting Survival Tree model to better handle complex data in China's consumer finance market and improve the accuracy of credit risk assessment. By introducing GBST, the author not only solves the shortcomings of traditional methods in handling time - to - event data but also provides new ideas and technical means for future research.