Speech Enhancement Integrating the MVDR Beamforming and T-F Masking

Jinru Zhu,Changchun Bao,Rui Cheng
DOI: https://doi.org/10.1109/icspcc46631.2019.8960879
2019-01-01
Abstract:In this paper, a multi-channel speech enhancement method with the minimum variance distortionless response (MVDR) beamforming method based on the time-frequency (T-F) masking is proposed. In this study, First, the logarithmic power spectrum (LPS) features of multi-channel signals are used as input features to estimate a T-F mask of the reference microphone by the deep neural network (DNN) model. Then, the estimated mask is utilized to calculate speech covariance matrix that is used to estimate a steering vector for constructing the MVDR beamformer. The steering vector is estimated by the generalized eigen-value decomposition (GEVD) method. Finally, the output speech of the beamformer is processed by the DNN-based IRM model. In order to prove the effectiveness of the proposed method, the perceptual evaluation of speech quality (PESQ) and the segment signal-to-noise ratio (SSNR) are employed. The experimental results show that the proposed method effectively increased the PESQ and SSNR.
What problem does this paper attempt to address?