Scaling The Training Of Recurrent Neural Networks On Sunway Taihulight Supercomputer

Ouyi Li,Wenlai Zhao,Xuancheng Huang,Yushu Chen,Lin Gan,Hongkun Yu,Jiacheng Zhang,Yang Liu,Haohuan Fu,Guangwen Yang
DOI: https://doi.org/10.1007/978-3-030-22734-0_31
2019-01-01
Abstract:The recurrent neural network (RNN) models require longer training time with larger datasets and bigger number of parameters. Distributed training with large mini-batch size is a potential solution to accelerate the whole training process. This paper proposes a framework for large-scale training RNN/LSTM on the Sunway TaihuLight (SW) supercomputer. We take series of architecture-oriented optimizations for the memory-intensive kernels in RNN models to improve the computing performance. The lazy communication scheme with improved communication implementation and the distributed training and testing scheme are proposed to achieve high scalability for distributed training. Furthermore, we explore the training algorithm with large mini-batch size, in order to improve convergence speed without losing accuracy. The framework supports training RNN models with large size of parameters with at most 800 training nodes. The evaluation results show that, compared to training with single computing node, training based on proposed framework can achieve a 100-fold convergence rate with 8,000 mini-batch size.
What problem does this paper attempt to address?