Fast Training and Model Compression of Gated RNNs via Singular Value Decomposition

Rui Dai,Lefei Li,Wenjian Yu
DOI: https://doi.org/10.1109/IJCNN.2018.8489156
2018-01-01
Abstract:Long Short-Term Memory (LSTM) network and Gated Recurrent Units (GRU) network are two widely-used gated Recurrent Neural Network (RNN) architectures. Both of them usually have a huge model size and require a long time to be trained. In this paper, we first propose a singular value decomposition (SVD) based approach for fast training of LSTM. Then, the factorized model and SVD based training approach are proposed for the GRU network, which adaptively choose the rank parameter for the matrix factorization model and reduce the training time and parameters of the gated RNNs. Experiments are carried out on the image classification and sentiment classification tasks using datasets MNIST and IMDB, respectively. The results show that the proposed LSTM-SVD approach achieves up to 3.9X speedup compared with training the original LSTM model, without loss of accuracy. The approaches for training the GRU network also have about 2X speedup. And, with the factorized models the quantity of RNN cell parameters can be significantly reduced by more than 10X.
What problem does this paper attempt to address?