Speech Separation Based on Improved Deep Neural Networks with Dual Outputs of Speech Features for Both Target and Interfering Speakers

Yanhui Tu,Jun Du,Yong Xu,Lirong Dai,Chin-Hui Lee
DOI: https://doi.org/10.1109/iscslp.2014.6936615
2014-01-01
Abstract:In this paper, a novel deep neural network (DNN) architecture is proposed to generate the speech features of both the target speaker and interferer for speech separation. DNN is adopted here to directly model the highly nonlinear relationship between speech features of the mixed signals and the two competing speakers. With the modified output speech features for learning the parameters of the DNN, the generalization capacity to unseen interferers is improved for separating the target speech. Meanwhile, without any prior information from the interferer, the interfering speech can also be separated. Experimental results show that the proposed new DNN enhances the separation performance in terms of different objective measures under the semi-supervised mode where the training data of the target speaker is provided while the unseen interferer in the separation stage is predicted by using multiple interfering speakers mixed with the target speaker in the training stage.
What problem does this paper attempt to address?