A Human Auditory Perception Loss Function Using Modified Bark Spectral Distortion for Speech Enhancement

Xiaofeng Shu,Yi Zhou,Hongqing Liu,Trieu-Kien Truong
DOI: https://doi.org/10.1007/s11063-020-10212-z
IF: 2.565
2020-03-03
Neural Processing Letters
Abstract:Human listeners often have difficulties understanding speech in the presence of background noise in daily speech communication environments. Recently, deep neural network (DNN)-based techniques have been successfully applied to speech enhancement and achieved significant improvements over the conventional approaches. However, existing DNN-based methods usually minimize the log-power spectral-based or the masking-based mean squared error (MSE) between the enhanced output and the training target (e.g., the ideal ratio mask (IRM) of the clean speech), which is not closely related to human auditory perception. In this letter, a modified bark spectral distortion loss function, which can be considered as an auditory perception-based MSE, is proposed to replace the conventional MSE in DNN-based speech enhancement approaches to further improve the objective perceptual quality. Experimental results reveal that the proposed method can obtain improved speech enhancement performance, especially in terms of objective perceptual quality in all experimental settings when compared with the DNN-based methods using the conventional MSE criterion.
computer science, artificial intelligence
What problem does this paper attempt to address?