Research on Speech Separation Technology Based on Deep Learning

Yan Zhou,Heming Zhao,Jie Chen,Xinyu Pan
DOI: https://doi.org/10.1007/s10586-018-2013-6
2018-01-01
Cluster Computing
Abstract:In order to solve the problem of instability of the traditional speech separation algorithm, a kind of reverberation speech separation model based on deep learning is proposed. The problem of speech separation in reverberation environment has been studied. The auditory scene analysis is used to simulate the human auditory perception ability. According to the ideal two value mode principle, the target speech signal can be extracted. Moreover, the deep neural network (DNN) shows great learning ability in speech recognition and artificial intelligence. In this paper, a DNN model is proposed to learn the inverse reverberation and denoising by learning the spectrum mapping between "contaminated'' speech and pure speech. By extracting a series of spectrum features, the time dynamic information of adjacent frames is fused. The DNN is used to transform the coded spectrum, and restore the pure voice frequency spectrum. Finally, the time domain signal is reconstructed. In addition, the feature classification ability of DNN is also proposed to complete the separation of double sound reverberation speech. The binaural features ITD and ILD and the mono features GFCC are fused to form a long eigenvector. The DNN is pre-trained by RBM to complete the classification task. The results show that the proposed model improves the quality and intelligibility of the speech separation, and enhances the stability of the system significantly.
What problem does this paper attempt to address?