Joint Training Of Complex Ratio Mask Based Beamformer And Acoustic Model For Noise Robust Asr

Yong Xu,Chao Weng,Like Hui,Jianming Liu,Meng Yu,Dan Su,Dong Yu
DOI: https://doi.org/10.1109/icassp.2019.8682576
2019-01-01
Abstract:In this paper, we present a joint training framework between the multi-channel beamformer and the acoustic model for noise robust automatic speech recognition (ASR). The complex ratio mask (CRM), demonstrated to be more effective than the ideal ratio mask (IRM), is proposed to estimate the covariance matrix for the beamformer. Minimum Variance Distortionless Response (MVDR) beamformer and Generalized Eigenvalue (GEV) beamformer are both investigated under the CRM-based joint training architecture. We also propose a robust mask pooling strategy among multiple channels. A long short-term memory (LSTM) based language model is utilized to re-score hypotheses which further improves the overall performance. We evaluate the proposed methods on CHiME-4 challenge dataset. The CRM based system achieves a relative 10% reduction on word error rate (WER) compared with the IRM based system. Without sequence discriminative training, our best single system already achieves an average WER 2.72% on the test set which is comparable to the state-of-the-art.
What problem does this paper attempt to address?