Joint Noise and Mask Aware Training for DNN-based Speech Enhancement with SUB-band Features

Qing Wang,Jun Du,Li-Rong Dai,Chin-Hui Lee
DOI: https://doi.org/10.1109/hscma.2017.7895570
2017-01-01
Abstract:We present a joint noise and mask aware training strategy for deep neural network (DNN) based speech enhancement with sub-band features. First, based on the analysis of the previously proposed dynamic noise aware training approach tested on the wide-band (16 KHz) speech data, the full-band dynamic noise features cannot always improve the enhancement performance due to inaccurate noise estimation. Accordingly, we improve dynamic noise estimation via enhanced post-processing, interpolation with the static noise estimation, and sub-band features. Then, the ideal ratio mask (IRM), as a relative quantity for the description of both speech and noise information, is verified to have a strong complementarity with dynamic noise estimation via joint aware training of DNN. Furthermore, a comprehensive study on different approaches to estimate noise and IRM is conducted. The experiments under unseen noises demonstrate the effectiveness of the proposed approach in both speech quality and intelligibility measures in comparison to the conventional DNN approach.
What problem does this paper attempt to address?