Cost-sensitive Regression Learning on Small Dataset Through Intra-Cluster Product Favoured Feature Selection

Fangfang Xu,Huan Zhao,Weihua Zhou,Yun Zhou
DOI: https://doi.org/10.1080/09540091.2021.1970719
2021-01-01
Connection Science
Abstract:Massive regression and forecasting tasks are generally cost-sensitive regression learning problems with asymmetric costs between over-prediction and under-prediction. However, existing classic methods, such as clustering and feature selection, are subject to difficulties in dealing with small datasets. As one of the key challenges, it is difficult to statistically validate the importance of features using traditional algorithms (e.g. the Boruta algorithm) owing to insufficient available data. By leveraging the feature information intra-cluster (item group with similar attributes), we propose an intra-cluster product favoured (ICPF) feature selection algorithm to select the information based on the traditional filtering method (specifically the Boruta algorithm in our study). The experimental results show that the ICPF algorithm significantly reduces the number of dimensions of the selected feature set and improves the performance of cost-sensitive regression learning. The misprediction cost decreased by 33.5% (linear-linear cost function) and 32.4% (quadratic-quadratic cost function) after adopting the ICPF algorithm. In addition, the advantage of the ICPF algorithm is robust to other regression models, such as random forest and XGboost.
What problem does this paper attempt to address?