SingleCaffe: an Efficient Framework for Deep Learning on a Single Node

Chenxu Wang,Yixian Shen,Jia,Yutong Lu,Zhiguang Chen,Bo Wang
DOI: https://doi.org/10.1109/access.2018.2879877
IF: 3.9
2018-01-01
IEEE Access
Abstract:Deep learning (DL) is currently the most promising approach in complicated applications such as computer vision and natural language processing. It thrives with large neural networks and large datasets. However, larger models and larger datasets result in longer training times that impede research and development progress. The modern high-performance and data-parallel nature of hardware equipped with high computing power, such as GPUs, has triggered the widespread adoption of such hardware in DL frameworks, such as Caffe, Torch, and TensorFlow. However, most DL frameworks cannot make full use of this high-performance hardware, and computational efficiency is low. In this paper, we present SingleCaffe1, a DL framework that can make full use of such hardware and improve the computational efficiency of the training process. SingleCaffe opens up multiple threads to speed up the training process within a single node and adopts data parallelism on multiple threads. During the training process, SingleCaffe selects a thread as a parameter server thread and the other threads as worker threads. Both data and workloads are distributed across worker threads, while the server thread maintains the globally shared parameters. The framework also manages memory allocation carefully to reduce the memory overhead. The experimental results show that SingleCaffe can improve training efficiency well, and the performance on a single node can even achieve the distributed training of a dozen nodes.
What problem does this paper attempt to address?