Preprocessing and Feature Extraction Methods for Microfinance Overdue Data.

Jiahao Wang,Liang Zhang,Peiyi Shen,Guangming Zhu,Yuhuai Zhang
DOI: https://doi.org/10.1007/978-981-13-2922-7_2
IF: 4.426
2018-01-01
Big Data
Abstract:With rapid development of the microfinance industry, the number of customs has surged and the bad debt rate has risen dramatically. Increase of the overdue customers has led to a substantial augment in business volume in the collection industry. However, under the current policy of protecting customer privacy, the lack of credit information, as well as the constraints of collection’s cost and scale is two major issues that the collection industry comes across. This paper proposes a repayment probability forecasting system that does not rely on credit information, but can improve the collection efficiency. The proposed system focuses on preprocessing more than one hundred thousand overdue data, using word2vec to locate the keyword, extracting features of the data according to their types. Our system also depends on mature machine learning models to predict the customers’ ability of repayment, including LR, GBDT, XGBoost and RF. Meanwhile, we not only use AUC but also design a new evaluation index that can be adapted to the business background to evaluate the system’s performance. Experiments results show that, in the case of a surge in business volume and around 1.5% of the overdue costumers’ repayment, through our system, collection on only the first half of the customers with high scores can increase the repayment rate by at least 1.2%, which greatly increases the work efficiency and reduces manual labor for collection.
What problem does this paper attempt to address?