Combine Waveform and Spectral Methods for Single-channel Speech Enhancement

Miao Li,Hui Zhang,Xueliang Zhang
DOI: https://doi.org/10.23919/apsipaasc55919.2022.9979916
2022-01-01
Abstract:Many speech enhancement methods map the noisy spectrograms into clean ones. The phase information is required when retrieving the signal to the waveform. How to obtain the phase information is a critical problem. In this work, we obtain the phase information by a waveform enhancement approach based on a dual-path recurrent neural network (DP-RNN). The DP-RNN has significantly improved speaker separation performance. We adopt the DP-RNN into the speech enhancement task and propose a more lightweight dual-path bidirectional long short-term memory (DP-BiLSTM) network to overcome the high complexity of forwarding processing in the conventional DP-RNN. Specifically, a DP-BiLSTM is trained for waveform mapping, its output is divided into two branches for correlation processing. Firstly, it is converted into spectrum and phase, the phase is saved. Secondly, the output is added to the original noisy speech, and its spectrum is sent to another spectrum masking LSTM for further enhancement. The waveform and spectral enhancement networks are trained jointly. In the testing stage, the phase estimated from the waveform network is used together with the enhanced spectrogram to recover the waveform. In the proposed framework, both the spectrogram and phase are improved. Experimental results show that the proposed method can improve the performance with miniature model size and less computation. The number of model parameters is only around 100K.
What problem does this paper attempt to address?