Asynch-SGBDT: Train Stochastic Gradient Boosting Decision Trees in an Asynchronous Parallel Manner.

Daning Cheng,Shigang Li,Yunquan Zhang
DOI: https://doi.org/10.1109/ipdps54959.2023.00034
2018-01-01
Abstract:Gradient Boosting Decision Tree (GBDT) is a costly machine learning model. Current parallel GBDT algorithms generally follow a synchronous parallel design: Fork-join parallel manner, like MapReduce. Fork-join parallel manner needs considerable time. Thus, we propose whether synchronization is necessary for GBDT training and is asynchronous training manner efficient. In this paper, we solve the above problem by offering an asynchronous algorithm. We try to build a stochastic optimization problem by sampling, which shares the same output with original GBDT training problem and use asynchronous parallel SGD manner to train Gradient step GBDT. We name our algorithm as asynch-SGBDT. Our theoretical and experimental results indicate that compared with the serial GBDT training process, when the datasets' high sample diversity is high and using Gradient step training GBDT, asynch-SGBDT does not slow down convergence speed on the epoch, and the sample diversity of current high-dimensional sparse datasets is usually high. We conduct experiments on a 32-node cluster using four different datasets. The results show that with LightGBM using a single worker as the baseline, LightGBM (the state-of-theart synchronous parallel algorithm implement) on 32 workers achieves 5x-7x speedup, while our asynch-SGBDT on 32 workers increases the speedup to 11x-15x.
What problem does this paper attempt to address?