Using Speech Enhancement Preprocessing for Speech Emotion Recognition in Realistic Noisy Conditions

Hengshun Zhou,Jun Du,Yan-Hui Tu,Chin-Hui Lee
DOI: https://doi.org/10.21437/interspeech.2020-2472
2020-01-01
Abstract:In this study, we investigate the effects of deep learning (DL)-based speech enhancement (SE) on speech emotion recognition (SER) in realistic environments. First, we use emotion speech data to train regression-based speech enhancement models which is shown to be beneficial to noisy speech emotion recognition. Next, to improve the model generalization capability of the regression model, an LSTM architecture with a design of hidden layers via simply densely-connected progressive learning, is adopted for the enhancement model. Finally, a post-processor utilizing an improved speech presence probability to estimate masks from the above proposed LSTM structure is shown to further improves recognition accuracies. Experiments results on the IEMOCAP and CHEAVD 2.0 corpora demonstrate that the proposed framework can yield consistent and significant improvements over the systems using unprocessed noisy speech.
What problem does this paper attempt to address?