Phase-Aware Speech Enhancement Based on Deep Neural Networks

Naijun Zheng,Xiao-Lei Zhang
DOI: https://doi.org/10.1109/taslp.2018.2870742
2019-01-01
IEEE/ACM Transactions on Audio Speech and Language Processing
Abstract:Short-time frequency transform (STFT) is fundamental in speech processing. Because of the difficulty of processing highly unstructured STFT phase, most speech-processing algorithms only operate with STFT magnitude, leaving the STFT phase far from explored. However, with the recent development of deep neural network (DNN) based speech processing, e.g., speech enhancement and recognition, phase processing is becoming more important than ever before as a new growing point of DNN-based methods. In this paper, we propose a phase-aware speech enhancement algorithm based on DNN. Specifically, in the training stage, when incorporating phase as a target, our core idea is to transform an unstructured phase spectrogram to its derivative along the time axis, i.e., instantaneous frequency deviation (IFD), which has a similar structure with its corresponding magnitude spectrogram. We further propose to optimize both IFD and magnitude jointly in a multiobjective learning framework. In the test stage, we propose a postprocessing method to recover the phase spectrogram from the estimated IFD. Experimental results demonstrate the effectiveness of the proposed method.
What problem does this paper attempt to address?