Polishing the Classical Likelihood Ratio Test by Supervised Learning for Voice Activity Detection.

Tianjiao Xu,Hui Zhang,Xueliang Zhang
DOI: https://doi.org/10.21437/interspeech.2020-1177
2020-01-01
Abstract:Voice activity detection (VAD) is essential for speech signal processing system, which desires low computational cost and high real-time processing. Likelihood ratio test (LRT) based VAD is a widely used and effective approach in many applications. However, it is still a challenge in low signal-to-noise ratio (SNR) and non-stationary noisy scenario. To cope with this challenge, we propose a supervised masking-based parameter estimation module with an adaptive threshold to improve the performance of a state-of-the-art LRT based VAD. Moreover, considering real-time processing, we compared the proposed with corresponding end-to-end supervised learning approaches in various model sizes. Experimental results show that the proposed method leads to consistently better performance than both of the existing LRT based method and end-to-end supervised learning based approaches.
What problem does this paper attempt to address?