Ju l 2 01 9 Asynch-SGBDT : Train a Stochastic Gradient Boosting Decision Tree in an Asynchronous Parallel Manner

Daning Cheng,Fen Xia,Shigang Li,Yunquan Zhang
2019-01-01
Abstract:Gradient Boosting Decision Tree (GBDT) is an effective yet costly machine learning model. Current parallel GBDT algorithms generally follow a synchronous parallel design. Since the processing time for different nodes varies in practice, synchronisation in a parallel computing environment needs considerable time. In this paper, we propose an asynchronous parallel GBDT algorithm named as asynch-SGBDT. Our theoretical and experimental results indicate that compared with the serial GBDT training process, when the datasets are high-dimensional sparse datasets, asynch-SGBDT does not slow down convergence speed on the epoch. Asynch-SGBDT achieves 14x to 22x speedup when it uses 32 workers; LightGBM, as the benchmark, only achieves 5x to 7x speedup using 32 machines; Dimboost, as another benchmark, only achieves 4x to 5x speedup using 32 workers. All of theory and experimental results show that asynch-SGBDT is state-of-the-art parallel GBDT algorithm.
What problem does this paper attempt to address?