Boosted deep neural networks and multi-resolution cochleagram features for voice activity detection.

Xiao-Lei Zhang,DeLiang Wang
DOI: https://doi.org/10.21437/interspeech.2014-367
2014-01-01
Abstract:Voice activity detection (VAD) is an important frontend of many speech processing systems. In this paper, we describe a new VAD algorithm based on boosted deep neural networks (bDNNs). The proposed algorithm first generates multiple base predictions for a single frame from only one DNN and then aggregates the base predictions for a better prediction of the frame. Moreover, we employ a new acoustic feature, multi-resolution cochleagram (MRCG), that concatenates the cochleagram features at multiple spectrotemporal resolutions and shows superior speech separation results over many acoustic features. Experimental results show that bDNN-based VAD with the MRCG feature outperforms state-of-the-art VADs by a considerable margin.
What problem does this paper attempt to address?