Beamforming and Deep Models Integrated Multi-talker Speech Separation

Chao Peng,Xihong Wu,Tianshu Qu
DOI: https://doi.org/10.1109/icsidp47821.2019.9173118
2019-01-01
Abstract:Recently, although the traditionally proposed Permutation Invariant Training (PIT) has attracted much attention, it performs poorly on datasets of unknown number of speakers. In this paper, we propose an approach based on beamforming and deep models (BDM) to solve the problem mentioned above. BDM firstly estimates the number of speakers by sound source localization algorithm and then enhances the target speech with beamforming in spatial domain. Subsequently the supervised deep models are used to extract the clean speech of the target speaker in time and frequency domain. Experimental results show that the BDM can improve the separation performance as well as speech intelligibility compared with the single channel and multi-channel PIT when tested on two-speaker, three-speaker and four-speaker mixtures, respectively.
What problem does this paper attempt to address?