DNN-based Voice Activity Detection for Speaker Recognition

Fanhu Bie,Zhiyong Zhang,Dong Wang,Thomas Fang Zheng
2015-01-01
Abstract:Correspondence: biefh@cslt.riit.tsinghua.edu.cn Center for Speech and Language Technology, Research Institute of Information Technology, Tsinghua University, ROOM 1-303, BLDG FIT, 100084 Beijing, China Department of Computer Science and Technology, Tsinghua University, ROOM 1-303, BLDG FIT, 100084 Beijing, China Full list of author information is available at the end of the article Abstract Voice activity detection (VAD) plays an important role in speaker recognition. This paper proposes to use a novel DNN-based VAD which harnesses the power of deep neural networks (DNN) in learning speech patterns from a large labelled database designed for speech recognition, and thus deliberately optimizes the discrimination between speech and non-speech signals. More interestingly, the output of the DNN offers a noise prior, which may lend itself to a Bayesian treatment for the uncertainty of noise in speaker recognition. The experiments were conducted on the mismatched-microphone condition (C3) of the SRE08 core test. It was found that the DNN-based VAD offered a relative reduction of 22.0% in equal error rate (EER) when compared to a fine-tuned energy-based VAD. When the Bayesian approach was employed, additional gains were obtained, particularly in noise conditions.
What problem does this paper attempt to address?