A Time Delay Neural Network with Shared Weight Self-Attention for Small-Footprint Keyword Spotting

Ye Bai,Jiangyan Yi,Jianhua Tao,Zhengqi Wen,Zhengkun Tian,Chenghao Zhao,Cunhang Fan
DOI: https://doi.org/10.21437/Interspeech.2019-1676
2019-01-01
Abstract:Keyword spotting requires a small memory footprint to run on mobile devices. However, previous works still use several hundred thousand parameters to achieve good performance. To address this issue, we propose a time delay neural network with shared weight self-attention for small-footprint keyword spotting. By sharing weights, the parameters of self-attention are reduced but without performance reduction. The publicly available Google Speech Commands dataset is used to evaluate the models. The number of parameters (12K) of our model is 1/20 of state-of-the-art ResNet model (239K). The proposed model achieves an error rate of 4.19%, which is comparable to the ResNet model.
What problem does this paper attempt to address?