Utterance-level Permutation Invariant Training with Discriminative Learning for Single Channel Speech Separation

Cunhang Fan,Bin Liu,Jianhua Tao,Zhengqi Wen,Jiangyan Yi,Ye Bai
DOI: https://doi.org/10.1109/iscslp.2018.8706611
2018-01-01
Abstract:The challenge in deep learning for speaker independent speech separation comes from the label ambiguity or permutation problem. Utterance-level permutation invariant training (uPIT) technique solves this problem by minimizing the mean square error (MSE) over all permutations between outputs and targets. It is a state-of-the-art deep learning architecture. However, uPIT only minimizes the chosen permutation with the lowest MSE, not discriminates it with other permutations. This may lead to increase the possibility of remixing the separated sources. In this paper, we propose a uPIT with discriminative learning (uPITDL) method to solve this problem by adding one regularization at the cost function. In other words, we minimize the difference between the outputs of model and their corresponding reference signals. Moreover, the dissimilarity between the prediction and the targets of other sources is maximized. We evaluate the proposed model on WSJ0-2mix dataset. Experimental results show 22.0% and 24.8% relative improvements under both closed and open conditions compared with the uPIT baseline.
What problem does this paper attempt to address?