A robust voice activity detector based on Weibull and Gaussian Mixture distribution

Yuan Liang,Xianglong Liu,Mi Zhou,Yihua Lou,Baosong Shan
DOI: https://doi.org/10.1109/ICSPS.2010.5555230
2010-01-01
Abstract:In this paper, we focus on the observation and state duration distributions in hidden semi-Markov model (HSMM)-based voice activity detection. To perform robustly in noisy environment, firstly, acoustic features of noisy speech are extracted by Mel-frequency cepstrum processor after filtering the raw speech with a modified Wiener filter. According to the statistic on TIMIT database, we use Gaussian Mixture distributions (GMD) for both speech and non-speech state to correlate the MFCC feature vectors and state sequences. The transition probability in HSMM is not a constant like in HMM but depends on the elapsed time in last state, and is modeled by Weibull distribution (WD) in this paper. The final VAD decision is made according to the likelihood ratio test (LRT) incorporating state prior knowledge. Also a adaptive threshold is used to achieve better detection results. Experiments on noisy speech data show that the proposed method performs more robustly and accurately than the standard ITU-T G.729B, AMR2, HMM-based VAD and VAD using Laplacian-Gaussian model.
What problem does this paper attempt to address?