DeepMMSE: A Deep Learning Approach to MMSE-Based Noise Power Spectral Density Estimation

Qiquan Zhang,Aaron Nicolson,Mingjiang Wang,Kuldip Paliwal,Chenxu Wang
DOI: https://doi.org/10.1109/taslp.2020.2987441
2020-01-01
IEEE/ACM Transactions on Audio Speech and Language Processing
Abstract:An accurate noise power spectral density (PSD) tracker is an indispensable component of a single-channel speech enhancement system. Bayesian-motivated minimum mean-square error (MMSE)-based noise PSD estimators have been the most prominent in recent time. However, they lack the ability to track highly non-stationary noise sources due to current methods of a priori signal-to-noise (SNR) estimation. This is caused by the underlying assumption that the noise signal changes at a slower rate than the speech signal. As a result, MMSE-based noise PSD trackers exhibit a large tracking delay and produce noise PSD estimates that require bias compensation. Motivated by this, we propose an MMSE-based noise PSD tracker that employs a temporal convolutional network (TCN) a priori SNR estimator. The proposed noise PSD tracker, called DeepMMSE makes no assumptions about the characteristics of the noise or the speech, exhibits no tracking delay, and produces an accurate estimate that requires no bias correction. Our extensive experimental investigation shows that the proposed DeepMMSE method outperforms state-of-the-art noise PSD trackers and demonstrates the ability to track abrupt changes in the noise level. Furthermore, when employed in a speech enhancement framework, the proposed DeepMMSE method is able to outperform state-of-the-art noise PSD trackers, as well as multiple deep learning approaches to speech enhancement. Availability: DeepMMSE is available at: https://github.com/anicolson/DeepXi.
What problem does this paper attempt to address?