PipePar: A Pipelined Hybrid Parallel Approach for Accelerating Distributed DNN Training

Jiange Li,Yuchen Wang,Jinghui Zhang,Jiahui Jin,Fang Dong,Lei Qian
DOI: https://doi.org/10.1109/CSCWD49262.2021.9437625
2021-01-01
Abstract:Large scale DNN training tasks are exceedingly compute-intensive and time-consuming, which are usually executed on highly-parallel platforms. Data and model parallelization is a common way to speed up the training progress across devices. However, they tend to achieve sub-optimal performance due to the communication overheads and unbalanced load among servers. Recent emerging pipelining solutions ...
What problem does this paper attempt to address?