Denoi-SpEx plus : A Speaker Extraction Network based Speech Dialogue System

Yun Hao,Xiangkang Huang,Huichou Huang,Qingyao Wu
DOI: https://doi.org/10.1109/ICEBE52470.2021.00030
2021-01-01
Abstract:The speech dialogue system has gradually been widely used in daily life. Users can consult and communicate with the system through natural language. However, in practical applications, third-person background sounds and background noise interference in real dialogue scenes will be encountered. The uncertainty and complexity of these background sounds will have a bad impact on the recognition of the system. A good speech enhancement module can help us to separate the target speaker from the original speech. Recently, a solution called SpEx+ was proposed from the time domain, but SpEx+ needs a reference speech to assist in training. This reference speech may have noise in actual applications that will affect performance. Therefore, we propose a Denoi-SpEx+ model. Before the reference speech is input to the network, a speech denoising network is added, so that the quality of speech separation in practical applications can be guaranteed. Experiments show that our model can significantly improve the performance of speech separation model of noisy reference speech.
What problem does this paper attempt to address?