Gated Convolutional Lstm For Speech Commands Recognition

Dong Wang,Shaohe Lv,Xiaodong Wang,Xinye Lin
DOI: https://doi.org/10.1007/978-3-319-93701-4_53
2018-01-01
Abstract:As the mobile device gaining increasing popularity, Acoustic Speech Recognition on it is becoming a leading application. Unfortunately, the limited battery and computational resources on a mobile device highly restrict the potential of Speech Recognition systems, most of which have to resort to a remote server for better performance. To improve the performance of local Speech Recognition, we propose C-1-G-2-Blstm. This model shares Convolutional Neural Network's ability of learning local feature and Recurrent Neural Network's ability of learning sequence data's long dependence. Furthermore, by adopting the Gated Convolutional Neural Network instead of a traditional CNN, we manage to greatly improve the model's capacity. Our tests demonstrate that C-1-G-2-Blstm can achieve a high accuracy at 90.6% on the Google Speech Commands data set, which is 6.4% higher than the state-of-art methods.
What problem does this paper attempt to address?