End-to-End Speech Keyword Spotting System

Shenghua Hu,Hanyue Liu,Liang Xu,Jing Wang,Yujun Wang,Peng Gao,Weiji Zhuang
DOI: https://doi.org/10.1109/ICCSI58851.2023.10304048
2023-01-01
Abstract:The purpose of speech keyword spotting is to detect a set of predefined keywords from a continuous speech signal stream. Based on the research on end-to-end technologies in the field of deep learning, this paper designs and implements an end-to-end speech keyword spotting algorithm, which has a wide range of applications in various fields, such as smartphones and automobiles. The algorithm first trains an acoustic model based on a deep neural network, which receives the acoustic features and outputs the posterior probability of the wake-up word. Then, the posterior probability is smoothed to obtain the confidence score of the wake-up word. Through the above process, the traditional decoding process can be avoided effectively. In addition, this paper compares various neural network structures of acoustic model, such as the time-delay neural network (TDNN) and the factorized time-delay neural network (TDNN-F). Through comparative experiments by controlling variables, it is verified that the proposed end-to-end speech keyword spotting algorithm has competitive performance compared with the other popular technologies.
What problem does this paper attempt to address?