Voice Activity Detection Based on Conjugate Subspace Matching Pursuit and Likelihood Ratio Test

Shiwen Deng,Jiqing Han
DOI: https://doi.org/10.1186/1687-4722-2011-12
2011-01-01
Abstract:Most of voice activity detection (VAD) schemes are operated in the discrete Fourier transform (DFT) domain by classifying each sound frame into speech or noise based on the DFT coefficients. These coefficients are used as features in VAD, and thus the robustness of these features has an important effect on the performance of VAD scheme. However, some shortcomings of modeling a signal in the DFT domain can easily degrade the performance of a VAD in a noise environment. Instead of using the DFT coefficients in VAD, this article presents a novel approach by using the complex coefficients derived from complex exponential atomic decomposition of a signal. With the goodness-of-fit test, we show that those coefficients are suitable to be modeled by a Gaussian probability distribution. A statistical model is employed to derive the decision rule from the likelihood ratio test. According to the experimental results, the proposed VAD method shows better performance than the VAD based on the DFT coefficients in various noise environments.
What problem does this paper attempt to address?