Separation of Multiple Speech Sources in Reverberant Environments Based on Sparse Component Enhancement
Lu Li,Maoshen Jia,Jinxiang Liu,Tun-Wen Pai
DOI: https://doi.org/10.1007/s00034-023-02383-6
IF: 2.311
2023-01-01
Circuits Systems and Signal Processing
Abstract:Multiple speech source separation plays an important role in many applications such as automatic speech recognition, acoustical surveillance, and teleconferencing. In this study, we propose a method for the separation of multiple speech sources in a reverberant environment based on sparse component enhancement. In a recorded signal (i.e., a mixed signal of multiple speech sources), there are always time–frequency points where only one source is active or dominant. It is the sparsity of speech signals. Such time–frequency points are called sparse component points. However, in a reverberant environment, the sparsity of the speech signal is affected, resulting in a decrease in the number of sparse component points in the recorded signal, which affects the quality of the separated source signal. In this study, for mixture signals recorded by a soundfield microphone (a microphone array), we first experimentally analyze the negative impact of reverberation on sparse components and then develop a sparse component enhancement method to increase the number of these points. Then, the sparse components are identified and classified according to the directions of arrival estimate of the sources. Next, the sparse components are used to guide the recovery of the non-sparse components. Finally, multiple source separation is achieved by the joint restoration of the sparse and non-sparse components of each source. The proposed method has low computational complexity and applies to underdetermined scenarios. Through a series of subjective and objective evaluation experiments, the effectiveness of the method is verified.