Unsupervised Single-Channel Speech Separation Via Deep Neural Network for Different Gender Mixtures

Yannan Wang,Jun Du,Li-Rong Dai,Chin-Hui Lee
DOI: https://doi.org/10.1109/apsipa.2016.7820736
2016-01-01
Abstract:In this study, we propose a regression approach via deep neural network (DNN) for unsupervised speech separation in a single-channel setting. We rely on a key assumption that two speakers could be well segregated if they are not too similar to each other. A dissimilarity measure between two speakers is then proposed to characterize the separation ability between competing speakers. We demonstrate that the distance between speakers of different genders is large enough to warrant a possible separation. We finally propose a DNN architecture with dual outputs, one representing the female speaker group and the other characterizing the male speaker group. Trained and tested on the Speech Separation Challenge corpus our experimental results show that the proposed DNN approach achieves large performance gains over the state-of-the-art unsupervised techniques without using specific knowledge about the mixed target and interfering speakers and even outperforms the supervised GMM-based method.
What problem does this paper attempt to address?