A Speech Enhancement Algorithm By Iterating Single- And Multi-Microphone Processing And Its Application To Robust Asr

Xueliang Zhang,Zhong-Qiu Wang,DeLiang Wang
DOI: https://doi.org/10.1109/ICASSP.2017.7952161
2017-01-01
Abstract:We propose a speech enhancement algorithm based on single-and multi-microphone processing techniques. The core of the algorithm estimates a time-frequency mask which represents the target speech and use masking-based beamforming to enhance corrupted speech. Specifically, in single-microphone processing, the received signals of a microphone array are treated as individual signals and we estimate a mask for the signal of each microphone using a deep neural network (DNN). With these masks, in multimicrophone processing, we calculate a spatial covariance matrix of noise and steering vector for beamforming. In addition, we propose a masking-based post-filter to further suppress the noise in the output of beamforming. Then, the enhanced speech is sent back to DNN for mask reestimation. When these steps are iterated for a few times, we obtain the final enhanced speech. The proposed algorithm is evaluated as a frontend for automatic speech recognition (ASR) and achieves a 5.05% average word error rate (WER) on the real environment test set of CHiME-3, outperforming the current best algorithm by 13.34%.
What problem does this paper attempt to address?