A Lightweight Hybrid Multi-Channel Speech Extraction System with Directional Voice Activity Detection

Tianchi Sun,Tong Lei,Xu Zhang,Yuxiang Hu,Changbao Zhu,Jing Lu
DOI: https://doi.org/10.1109/icassp48485.2024.10445953
2024-01-01
Abstract:Although deep learning (DL) based end-to-end models have shown outstanding performance in multi-channel speech extraction, their practical applications on edge devices are restricted due to their high computational complexity. In this paper, we propose a hybrid system that can more effectively integrate the generalized sidelobe canceller (GSC) and a lightweight post-filtering model under the assistance of spatial speaker activity information provided by a directional voice activity detection (DVAD) module. In addition to guiding the update of the adaptive blocking matrix (ABM) and the adaptive interference canceller (AIC) used in GSC to alleviate the distortion of the desired speech, DVAD is also utilized as an auxiliary input to the post-filtering model to enhance its capability of interference suppression. The experimental results demonstrate that, with much lower computational costs, our method can achieve comparable performance with a current state-of-the-art end-to-end model on simulated data and generalize even better on real-world data.
What problem does this paper attempt to address?