ForestLayer: Efficient Training of Deep Forests on Distributed Task-Parallel Platforms

Guanghui Zhu,Qiu Hu,Rong Gu,Chunfeng Yuan,Yihua Huang
DOI: https://doi.org/10.1016/j.jpdc.2019.05.001
IF: 4.542
2019-01-01
Journal of Parallel and Distributed Computing
Abstract:Most of the existing deep models are deep neural networks. Recently, the deep forest opens a door towards an alternative to deep neural networks for many tasks and has attracted more and more attention. At the same time, the deep forest model becomes widely used in many real-world applications. However, the existing deep forest system is inefficient and lacks scalability. In this paper, we present ForestLayer, which is an efficient and scalable deep forest system built on distributed task-parallel platforms. First, to improve the computing concurrency and reduce the communication overhead, we propose a fine-grained sub-forest based task-parallel algorithm. Next, we design a novel task splitting mechanism to reduce the training time without decreasing the accuracy of the original method. To further improve the performance of ForestLayer, we propose three system-level optimization techniques, including lazy scan, pre-pooling, and partial transmission. Besides the systematic optimization, we also propose a set of high-level programming APIs to improve the ease-of-use of ForestLayer. Finally, we have implemented ForestLayer on the distributed task-parallel platform Ray. The experimental results reveal that ForestLayer outperforms the existing deep forest system gcForest with 7× to 20.9× speedup on a range of datasets. In addition, ForestLayer outperforms TensorFlow-based implementation on most of the datasets, while achieving better predictive performance. Furthermore, ForestLayer achieves good scalability and load balance.
What problem does this paper attempt to address?