Binaural Classification for Reverberant Speech Segregation Using Deep Neural Networks

Yi Jiang,DeLiang Wang,RunSheng Liu,ZhenMing Feng
DOI: https://doi.org/10.1109/taslp.2014.2361023
2014-01-01
Abstract:Speech signal degradation in real environments mainly results from room reverberation and concurrent noise. While human listening is robust in complex auditory scenes, current speech segregation algorithms do not perform well in noisy and reverberant environments. We treat the binaural segregation problem as binary classification, and employ deep neural networks (DNNs) for the classification task. The binaural features of the interaural time difference and interaural level difference are used as the main auditory features for classification. The monaural feature of gammatone frequency cepstral coefficients is also used to improve classification performance, especially when interference and target speech are collocated or very close to one another. We systematically examine DNN generalization to untrained spatial configurations. Evaluations and comparisons show that DNN-based binaural classification produces superior segregation performance in a variety of multisource and reverberant conditions.
What problem does this paper attempt to address?