Neural Spatial Filter: Target Speaker Speech Separation Assisted with Directional Information

Rongzhi Gu,Lianwu Chen,Shi-Xiong Zhang,Jimeng Zheng,Yong Xu,Meng Yu,Dan Su,Yuexian Zou,Dong Yu
DOI: https://doi.org/10.21437/interspeech.2019-2266
2019-01-01
Abstract:The recent exploration of deep learning for supervised speech separation has significantly accelerated the progress on the multi-talker speech separation problem. The multi-channel approaches have attracted much research attention due to the benefit of spatial information. In this paper, integrated with the power spectra and inter-channel spatial features at the input level, we explore to leverage directional features, which imply the speaker source from the desired target direction, for target speaker separation. In addition, we incorporate an attention mechanism to dynamically tune the model's attention to the reliable input features to alleviate spatial ambiguity problem when multiple speakers are closely located. We demonstrate, on the far-field WSJ0 2-mix dataset, that our proposed approach significantly improves the performance of speech separation against the baseline single-channel and multi-channel speech separation methods.
What problem does this paper attempt to address?