Efficient Gradient Boosted Decision Tree Training on GPUs

Zeyi Wen,Bingsheng He,Kotagiri Ramamohanarao,Shengliang Lu,Jiashuai Shi
DOI: https://doi.org/10.1109/ipdps.2018.00033
2018-01-01
Abstract:In this paper, we present a novel parallel implementation for training Gradient Boosting Decision Trees (GBDTs) on Graphics Processing Units (GPUs). Thanks to the wide use of the open sourced XGBoost library, GBDTs have become very popular in recent years and won many awards in machine learning and data mining competitions. Although GPUs have demonstrated their success in accelerating many machine learning applications, there are a series of key challenges of developing a GPU-based GBDT algorithm, including irregular memory accesses, many small sorting operations and varying data parallel granularities in tree construction. To tackle these challenges on GPUs, we propose various novel techniques (including Run-length Encoding compression and thread/block workload dynamic allocation, and reusing intermediate training results for efficient gradient computation). Our experimental results show that our algorithm named GPU-GBDT is often 10 to 20 times faster than the sequential version of XGBoost, and achieves 1.5 to 2 times speedup over a 40 threaded XGBoost running on a relatively high-end workstation of 20 CPU cores. Moreover, GPU-GBDT outperforms its CPU counterpart by 2 to 3 times in terms of performance-price ratio.
What problem does this paper attempt to address?