Computational Auditory Scene Analysis Based Voice Activity Detection

Ming Tu,Xiang Xie,Xingyu Na
DOI: https://doi.org/10.1109/ICPR.2014.147
2014-01-01
Abstract:Voice activity detection (VAD) is always important in many speech applications. In this paper, two VAD methods using novel features based on computational auditory scene analysis (CASA) are proposed. The first method is based on statistical model based VAD. Cochlea gram instead of discrete fourier transform coefficients is used as time-frequency representation to do statistical model based VAD. The second is a supervised method based on Gaussian Mixture Model. We extract gamma tone frequency cepstral coefficients (GFCC) from cochlea gram and use this feature to discriminate speech and noise in noisy signal. Gaussian mixture model is used to model GFCC of speech and noise. We evaluate the two methods both in the framework of multiple observation likelihood ratio test. The performances of the two methods are compared with several existing algorithms. The results demonstrate that CASA based features outperform several traditional features in the task of VAD, and the reasons of the superiority of the proposed two features are also investigated.
What problem does this paper attempt to address?