Deep Learning Based Binaural Speech Separation in Reverberant Environments

Xueliang Zhang,DeLiang Wang
DOI: https://doi.org/10.1109/taslp.2017.2687104
2017-01-01
IEEE/ACM Transactions on Audio Speech and Language Processing
Abstract:Speech signal is usually degraded by room reverberation and additive noises in real environments. This paper focuses on separating target speech signal in reverberant conditions from binaural inputs. Binaural separation is formulated as a supervised learning problem, and we employ deep learning to map from both spatial and spectral features to a training target. With binaural inputs, we first apply a fixed beamformer and then extract several spectral features. A new spatial feature is proposed and extracted to complement the spectral features. The training target is the recently suggested ideal ratio mask. Systematic evaluations and comparisons show that the proposed system achieves very good separation performance and substantially outperforms related algorithms under challenging multisource and reverberant environments.
What problem does this paper attempt to address?