LSTM-Based Iterative Mask Estimation and Post-Processing for Multi-Channel Speech Enhancement

Yan-Hui Tu,Jun Du,Nan Zhou,Chin-Hui Lee
DOI: https://doi.org/10.1109/apsipa.2017.8282081
2017-01-01
Abstract:Recently, we propose an iterative mask estimation (IME) approach to improve the conventional complex Gaussian mixture model (CGMM) based beamforming and yield the best multi-channel speech recognition accuracy in CHiME-4 challenge. In this study, we focus on multi-channel speech enhancement and present a novel approach via long short-term memory (LSTM) based IME and post-processing. First, an LSTM is adopted to estimate the ideal ratio mask (IRM) to improve the mask estimated by a CGMM. Then, the improved mask is used to derive a beamformer. Finally, the IME-based beamformed speech is processed by the LSTM-based regression model. Experiments on the CHiME-4 simulation data show that LSTM-based IME approach can improve the PESQ performance comparing to unprocessed signals, with relative PESQ improvements of 17.33% and 13.89%, and the LSTM-based post-processing can further yield performance gains based on the IME approach, with relative PESQ improvements of 11.42% and 10.00% for 6-channel and 2-channel cases, respectively.
What problem does this paper attempt to address?