Abstract:In this paper, we present a novel parallel implementation for training Gradient Boosting Decision Trees (GBDTs) on Graphics Processing Units (GPUs). Thanks to the excellent results on classification/regression and the open sourced libraries such as XGBoost, GBDTs have become very popular in recent years and won many awards in machine learning and data mining competitions. Although GPUs have demonstrated their success in accelerating many machine learning applications, it is challenging to develop an efficient GPU-based GBDT algorithm. The key challenges include irregular memory accesses, many sorting operations with small inputs and varying data parallel granularities in tree construction. To tackle these challenges on GPUs, we propose various novel techniques including (i) Run-length Encoding compression and thread/block workload dynamic allocation, (ii) data partitioning based on stable sort, and fast and memory efficient attribute ID lookup in node splitting, (iii) finding approximate split points using two-stage histogram building, (iv) building histograms with the aware of sparsity and exploiting histogram subtraction to reduce histogram building workload, (v) reusing intermediate training results for efficient gradient computation, and (vi) exploiting multiple GPUs to handle larger data sets efficiently. Our experimental results show that our algorithm named <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">ThunderGBM</italic> can be 10x times faster than the state-of-the-art libraries (i.e., XGBoost, LightGBM and CatBoost) running on a relatively high-end workstation of 20 CPU cores. In comparison with the libraries on GPUs, ThunderGBM can handle higher dimensional problems which the libraries become extremely slow or simply fail. For the data sets the existing libraries on GPUs can handle, ThunderGBM achieves up to 10 times speedup on the same hardware, which demonstrates the significance of our GPU optimizations. Moreover, the models trained by ThunderGBM are identical to those trained by XGBoost, and have similar quality as those trained by LightGBM and CatBoost.

Accelerate Tree Ensemble Learning Based on Adaptive Sampling.

Gradient Boosting With Piece-Wise Linear Regression Trees

DimBoost

HarpGBDT: Optimizing Gradient Boosting Decision Tree for Parallel Efficiency

Benchmarking and Optimization of Gradient Boosting Decision Tree Algorithms

An experimental evaluation of large scale GBDT systems

Exploiting GPUs for Efficient Gradient Boosting Decision Tree Training

Challenges and Opportunities of Building Fast GBDT Systems.

GPU-acceleration for Large-scale Tree Boosting

Efficient Gradient Boosted Decision Tree Training on GPUs

A Fast Sampling Gradient Tree Boosting Framework

Gradient Boosted Binary Histogram Ensemble for Large-scale Regression

Label Aggregation of Gradient Boosting Decision Trees

SecureBoost+: Large Scale and High-Performance Vertical Federated Gradient Boosting Decision Tree

Unbiased Gradient Boosting Decision Tree with Unbiased Feature Importance

Elastic gradient boosting decision tree with adaptive iterations for concept drift adaptation

swGBDT - Efficient Gradient Boosted Decision Tree on Sunway Many-Core Processor.

Accelerating Gradient Boosting Machine

MT-GBM: A Multi-Task Gradient Boosting Machine with Shared Decision Trees

Multi-grained and multi-layered gradient boosting decision tree for credit scoring

DeepGBM