An Efficient Keywords Spotting System with Speaker Verification Based on Binary Neural Networks

Yuan Lei,Dehui Kong,Ke Xu,Zhen Ren
DOI: https://doi.org/10.1109/iciba56860.2023.10165235
2023-01-01
Abstract:Keywords spotting plays an important role for enabling voice interface between human and smart devices. It is challenging to build a real-time system with high wake-up accuracy and low false alarm rate, especially when computational resources are limited. In this paper, to achieve the trade-off between accuracy and false alarm rate, we employ a two-stage system that uses a primary keyword spotting stage and a secondary speaker verification stage. Speaker verification model based on deep neural networks requires massive memory and computational resources, which makes it difficult to be deployed on memory and power constrained devices. To solve this problem, we proposed a speaker verification model based on binary neural networks (BNNs) where floating point weights and activations are compressed into 1-bit. To mitigate the performance degradation caused by information loss, BNNs structure is optimized by adding shortcuts to connect layers. Experimental results on text-dependent and text-independent datasets show that performance of proposed speaker verification model reach the level full-precision neural networks with much less memory and computational cost. We implement this model into a two-stage keywords spotting system and experiments show that this system can reduce false alarm rate significantly while keeping wake-up accuracy.
What problem does this paper attempt to address?