Auditory Feature for Monaural Speech Segregation

Yi Jiang,Runsheng Liu,Yuanyuan Zu
DOI: https://doi.org/10.2991/icieac-14.2014.16
2014-01-01
Abstract:Monaural speech segregation has been a very challenging problem for speech signal processing. The implication of the ideal binary masks to an auditory mixture has been shown to yield substantial improvements in signal-to- noise-ratio (SNR) and intelligibility. In this paper, we use the time-frequency (T-F) unit level gammatone frequency cepstral coefficients (GFCC) auditory feature to estimate the ideal binary mask for monaural speech segregation. The paper reports the successful attempt to use GFCC as the segregation cue with deep neural networks (DNNs) classifier. Results show that robust performance can be achieved across noisy and reverberant conditions.
What problem does this paper attempt to address?