A Hierarchical Communication Algorithm for Distributed Deep Learning Training.

Jiayu Zhang,Shaojun Cheng,Feng Dong,Ke Chen,Yong Qiao,Zhigang Mao,Jianfei Jiang
DOI: https://doi.org/10.1109/MWSCAS57524.2023.10405843
2023-01-01
Abstract:Distributed deep learning training nowadays has become an important workload on data center GPU clusters. However, in some cases, the inter-node bandwidth is limited (e.g., 20Gbps) and thus becomes a performance bottleneck for existing deep learning systems to scale deep learning training across multiple nodes. To exploit this insight, we propose a hierarchical communication algorithm combined with Asynchronous SGD and Synchronous SGD named AS-SGD to make full use of both inter-node and intra-node network bandwidth. Moreover, a set of system optimization techniques like quantization and decentralization are applied to further reduce communication costs. Finally, we present a performance evaluation of our algorithm on a 4-node cluster (each node with 8 Nvidia Tesla V100 GPUs). Experiments show that our algorithm achieves up to 4.95X speedup than existing state-of-the-art systems on popular deep learning models and datasets.
What problem does this paper attempt to address?