L-Tcn with the Help of Attention Weighting for the Speech Separation Task in the Reverberation Environment

Xiyu Song,Zhengyi An,Zhenghong Liu,Fangzhi Yao,Xiaodong Lin,Mei Wang
DOI: https://doi.org/10.2139/ssrn.4345666
2023-01-01
Abstract:Speech separation aims to separate a target speaker's speech from mixed speech. However, various noises and reverberations in real life make separation difficult. To solve this problem, a multi-channel microphone array is introduced to extract the spatial information of the target speech; however, the number of inter-channel phase differences (IPDs) increases linearly with the square of the number of microphones. Indeed, using all IPDs will impose a massive load on the system; therefore, we propose using the attention mechanism to weight the IPD to extract the spatial information of the target speech. Moreover, the time convolution network (TCN) exhibits excellent performance in speech separation; however, a large number of parameters of deep dilated convolution results in a huge system burden. In summary, a speech separation method aided by attention weighting is proposed for a lightweight time convolution network (L-TCN). Compared with the control experiment, the proposed method reduces the parameters by 90% and doubles the utilization rate of the IPD. Based on the premise of reducing the system load, the short-time objective intelligence (STOI) increases by 0.19 and the scale-invariant signal to distortion ratio (SI-SDR) increases by 6.33.
What problem does this paper attempt to address?