Towards Improving Statistical Model Based Voice Activity Detection.

Ming Tu,Xiang Xie,Yishan Jiao
DOI: https://doi.org/10.21437/interspeech.2014-370
2014-01-01
Abstract:Statistical model based voice activity detection (VAD) is commonly used in various speech related research and applications. In this paper, we try to improve the performance of statistical model based VAD via new feature extraction method. Our main innovation focuses on that we apply Mel-frequency subband coefficients with power-law nonlinearity as feature for statistical model based VAD instead of Discrete Fourier Transform (DFT) coefficients. This proposed feature is then modeled by Gaussian distribution. Performances of this method are comprehensively compared with existing methods. Meanwhile we also test power-law nonlinearity on existing methods. Experimental results prove that with proposed subband coefficients the performance of statistical model based VAD could be improved a lot. Power-law nonlinearity on DFT coefficients could also bring some improvement.
What problem does this paper attempt to address?