Boosting the Performance of SpEx+ by Attention and Contextual Mechanism

Chenyi Li,Zhiyong Wu,Wei Rao,Yannan Wang,Helen Meng
DOI: https://doi.org/10.1109/ISCSLP57327.2022.10038014
2022-01-01
Abstract:Target speaker extraction (TSE) aims to mimic human selective attention to extracting our interested voice from the multi-talker environment. Time-domain methods represented by SpEx+ [1] have promoted the process of TSE tasks while residual noise, squeaks, and over-suppression still exist in the extracted speech. In this paper, we explore three ways to improve the performance of SpEx+, referring to two attention-based weight learning mechanisms on disparate dimensions to generate typical features and the context mechanism to refine the extracted masks. Experiments on both single-channel and multi-channel signals preliminarily demonstrated the effectiveness of our explored methods on SpEx+, especially on speech quality and alleviating squeaks, unexpected noises, and over-suppression.
What problem does this paper attempt to address?