Noise Robust Voice Activity Detection Using Joint Phase and Magnitude Based Feature Enhancement.

Khomdet Phapatanaburi,Longbiao Wang,Zeyan Oo,Weifeng Li,Seiichi Nakagawa,Masahiro Iwahashi
DOI: https://doi.org/10.1007/s12652-017-0482-8
IF: 3.662
2017-01-01
Journal of Ambient Intelligence and Humanized Computing
Abstract:Recently, deep neural network (DNN)-based feature enhancement has been proposed for many speech applications. DNN-enhanced features have achieved higher performance than raw features. However, phase information is discarded during most conventional DNN training. In this paper, we propose a DNN-based joint phase- and magnitude -based feature (JPMF) enhancement (JPMF with DNN) and a noise-aware training (NAT)-DNN-based JPMF enhancement (JPMF with NAT-DNN) for noise-robust voice activity detection (VAD). Moreover, to improve the performance of the proposed feature enhancement, a combination of the scores of the proposed phase- and magnitude-based features is also applied. Specifically, mel-frequency cepstral coefficients (MFCCs) and the mel-frequency delta phase (MFDP) are used as magnitude and phase features. The experimental results show that the proposed feature enhancement significantly outperforms the conventional magnitude-based feature enhancement. The proposed JPMF with NAT-DNN method achieves the best relative equal error rate (EER), compared with individual magnitude- and phase-based DNN speech enhancement. Moreover, the combined score of the enhanced MFCC and MFDP using JPMF with NAT-DNN further improves the VAD performance.
What problem does this paper attempt to address?