Aware: Adaptive Distributed Training with Computation, Communication and Position Awareness for Deep Learning Model.

Yan Zeng,Guangzheng Yi,Yuyu Yin,Jiyang Wu,Meiting Xue,Jilin Zhang,Jian Wan,Yunquan Zhang
DOI: https://doi.org/10.1109/hpcc-dss-smartcity-dependsys57074.2022.00203
2022-01-01
Abstract:The accuracy of the neural networks can usually be improved by increasing the size of the dataset and the layers or operators of the network, as it has strong composability. But, it makes a challenge to train these models efficiently, due to the limited resources of a single device, such as memory, computational resource and so on. For this challenge, it has become an inevitable trend to automatically implement model parallelism across multiple devices. This paper proposes Aware, an adaptive distributed parallel training method, to search distributed parallel strategy automatically for deep learning model. It firstly proposes an operator fusion strategy with computation and communication awareness to simplify the computational graph of deep learning model. And then, it introduces position-aware graph embedding algorithm to extract the structural features of models, which can make the searched parallel strategies to transplant to other networks with similar structures. On this basis, we use reinforcement learning algorithm to search distributed parallel strategy automatically. This paper makes experiments with neural networks such as Inception V3, NMT, GNM, NasNet and ResNet, and uses CIFAR10 and PTB datasets to compare the two methods of Hierarchical and Placeto. The results show that compared with Placeto, Aware achieves up to 5% reductions in runtime. Aware achieves more than 8% reductions in runtime compared with Hierarchical. Moreover, it has better transplantation and generalization capabilities, and supports pre-training for large-scale network parallel strategy search and accelerates convergence.
What problem does this paper attempt to address?