Improving Monaural Speech Enhancement with Dynamic Scene Perception Module

Tian Lan,Jiajia Li,Wenxin Tai,Cong Chen,Jun Kang,Qiao Liu
DOI: https://doi.org/10.1109/ICME52920.2022.9858924
2022-01-01
Abstract:Speech enhancement aims to recover clean speech from complex noise backgrounds. This paper proposes a novel information processing module dubbed dynamic scene perception module (DSPM) that can help existing systems to accommodate various complex scenarios. The inspiration of DSPM is based on the observation that different regions of the noisy spectrum in different scenarios have different enhancing requirements. Concretely, DSPM consists of two parts, one for dynamic scene estimation, and the other for adaptive region perception. In particular, the scene estimator utilizes a spectrum-energy-based attention mechanism to obtain the coefficients of each convolution kernel. Then, at each position’ the region perceptron chooses the corresponding kernels by considering the requirements of the current region (preserve vocals or suppress noise). Systematic evaluations on the TIMIT corpus and Voice Bank + DEMAND demonstrate the effectiveness of our method. Compared with the existing systems, our proposed method achieved better performance under various SNR conditions and complex noise scenarios.
What problem does this paper attempt to address?