Single-channel Multi-speakers Speech Separation Based on Isolated Speech Segments.

Shanfa Ke,Zhongyuan Wang,Ruimin Hu,Xiaochen Wang
DOI: https://doi.org/10.1007/s11063-022-10887-6
IF: 2.565
2022-01-01
Neural Processing Letters
Abstract:In a real multi-speaker scenario, the signal collected by the microphone contains a large number of time periods with only one speaker’s speech which were called isolated speech segments. In view of this fact, this paper proposes a single-channel multi-speaker speech separation method based on the similarity between the speaker feature center and the mixture feature in the deep embedding space. In particular, the isolated speech segments extracted from the observed signal are converted to deep embedding vectors, and then a speaker feature center will be created. The similarity between this center and the deep embedding feature of mixture is constructed as a mask of the corresponding speaker, which is used to separate the speaker’s speech. A residual-based deep embedding network with stacked 2-D convolutional blocks instead of bi-directional long short-term memory is proposed for faster speed and better feature extraction. In addition, an isolated speech segment extraction method based on Chimera++ has been proposed, because the previous experiments showed that Chimera++ algorithm owns good separation performance for segments from only one speaker. The evaluation results on the general datasets show that the proposed method substantially outperforms competing algorithms up to 0.94 dB in Signal-to-Distortion Ratio.
What problem does this paper attempt to address?