An Online Speaker-aware Speech Separation Approach Based on Time-domain Representation

Hui Wang,Yan Song,Zeng-Xi Li,Ian McLoughlin,Li-Rong Dai
DOI: https://doi.org/10.1109/icassp40776.2020.9053068
2020-01-01
Abstract:Despite the significant progress of deep learning based speech separation methods, it remains challenging to extract and track the speech from target speakers, especially in a single-channel multiple speaker situation. Previously, the authors proposed a source-aware context network to exploit the temporal context in mixtures and estimated sources for online speech separation. In this paper, we propose a speaker-aware approach based on the source-aware context network structure, in which the speaker information is explicitly modeled by an auxiliary speaker identification branch. Then speech separation and speaker tracking can be jointly optimized by multi-task learning. Furthermore, we study the effectiveness of time-domain representation by proposing a raw sparse waveform encoder to preserve discriminative information. Experimental results on the WSJ0-2mix benchmark show that the proposed system significantly improves Signal-to-Distortion Ratio (SDR) performance.
What problem does this paper attempt to address?