Robust Small-Footprint Keyword Spotting Using Sequence-To-Sequence Model With Connectionist Temporal Classifier

Xiaoguang Xuan,Mingjiang Wang,Xin Zhang,Fengjiao Sun
DOI: https://doi.org/10.1109/ICICSP48821.2019.8958609
2019-01-01
Abstract:Aiming at the low computational complexity and frame-by-frame real-time processing conditions for small embedded devices, this paper propose a small-footprint Keyword Spotting (KWS) system using sequence-to-sequence model based on Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) with Connectionist Temporal Classifier (CTC), and this system which use the Per-channel energy normalization (PCEN) mel feature is more robust for the noisy speech recognition. The LSTM and GRU network can not only remember continuous information in the speech sequence signal, but also can process speech frame-by-frame in low-computation devices. Sequence-to-sequence model can output the pinyin posterior probability of each frame, and then perform de-duplication processing to get the final recognition result. The experimental results show that the single-layer small sequence-to-sequence model can achieve good results but not increasing computation complexity compare with deep neural network.
What problem does this paper attempt to address?