Label Aggregation of Gradient Boosting Decision Trees

X. C. Xiang,H. X. Zhang,S. T. Xia
DOI: https://doi.org/10.1145/3421558.3421581
2020-01-01
Abstract:Gradient boosting decision tree (GBDT) is one of the most widely used ensemble learning methods in both academia and industry. The core idea of GBDT is to consecutively fit new base learners to the residual errors between true and predicted values. However, GBDT is sensitive to noise in the data, especially label noise, which is very common in real-world data. Besides, due to the strong learning ability of GBDT, it is not easy to address this problem with simple ensemble method such as bagging. Thus we propose a new ensemble model of GBDT, called label aggregation of GBDTs (LA-GBDTs). The key points of our method lie on the random feature subspace and label aggregation, which lead to computational efficiency and robustness to noise. Experiments on several benchmark datasets show that label aggregation of GBDTs outperforms GBDT algorithms consistently in terms of accuracy. In addition, LA-GBDTs work well on datasets with a certain degree of label noise.
What problem does this paper attempt to address?