Training Acceleration for Deep Neural Networks: A Hybrid Parallelization Strategy

Zihao Zeng,Chubo Liu,Zhuo Tang,Wanli Chang,Kenli Li
DOI: https://doi.org/10.1109/DAC18074.2021.9586300
2021-01-01
Abstract:Deep Neural Networks (DNNs) are widely investigated due to their striking performance in various applications of artificial intelligence. However, with DNNs becoming larger and deeper, the computing resource of a single hardware accelerator is insufficient to meet the training requirements of popular DNNs. Hence, it is required to train them using multiple accelerators in a distributed setting. For a better utilization of the accelerators and a faster training, it is necessary to partition the whole process into segments that can run in parallel. However, in this context, intra-layer parallelization techniques (i.e., data and model parallelization) often face communication and memory bottlenecks, while the performance and resource utilization of inter-layer parallelization techniques (i.e., using pipelining) depend on the partitioning possibilities of the model. We present EffTra, a synchronous hybrid parallelization strategy, that uses a combination of intra-layer and inter-layer parallelism to realize a distributed training of DNNs. EffTra employs the idea of dynamic programming to try to search for the optimal partitioning of a DNN model and assigns devices to the obtained partitions. Our evaluation shows that EffTra accelerates training by up to 2.0x and 1.78x compared to state-of-the-art inter-layer (i.e., GPipe) and intra-layer (i.e., data parallelism) parallelization techniques respectively.
What problem does this paper attempt to address?