Robust Mask Estimation by Integrating Neural Network-Based and Clustering-Based Approaches for Adaptive Acoustic Beamforming.

Ying Zhou,Yanmin Qian
DOI: https://doi.org/10.1109/icassp.2018.8462462
2018-01-01
Abstract:Recently the mask-based beamforming approach received tremendous interest and is widely studied for multi-channel noise robust automatic speech recognition (ASR). Among the known mask estimation models, the neural network based mask estimation approach has received the most attention, resulting in a competitive performance. However this approach still suffers from training-testing mismatch between the simulated training and real test data. This paper proposes a new unsupervised scheme that can utilize the real data during NN-based mask estimator training. The clustering-based approach is applied on the real data first to generate the soft masks, which are then taken as the labels for NN-mask modeling. Moreover, acoustic adaptation technologies are borrowed from usual back-end acoustic modeling to the front-end NN-mask based beamformer, further reducing the training-testing acoustic mismatch. The proposed methods are evaluated on the CHIME-4 dataset. Experimental results show that the mismatch can be reduced significantly by the proposed strategies, leading to relative ~ 15.0% WER reduction compared to the conventional NN-mask beamforming for the real data under noisy conditions.
What problem does this paper attempt to address?