DimBoost

Jiawei Jiang,Bin Cui,Ce Zhang,Fangcheng Fu
DOI: https://doi.org/10.1145/3183713.3196892
2018-01-01
Abstract:Gradient boosting decision tree (GBDT) is one of the most popular machine learning models widely used in both academia and industry. Although GBDT has been widely supported by existing systems such as XGBoost, LightGBM, and MLlib, one system bottleneck appears when the dimensionality of the data becomes high. As a result, when we tried to support our industrial partner on datasets of the dimension up to 330K, we observed suboptimal performance for all these aforementioned systems. In this paper, we ask "Can we build a scalable GBDT training system whose performance scales better with respect to dimensionality of the data?"
What problem does this paper attempt to address?