SAPVAD: An Efficient Voice Activity Detection Model Based on Spectral Attention and Parallel Structure.

Jiaqi Chen,Xiaofeng Jin,Guirong Wang,Mingdong Yu,Xinghua Lu
DOI: https://doi.org/10.1109/CISP-BMEI60920.2023.10373253
2023-01-01
Abstract:Recently, the parallel architecture combining global and local features has achieved remarkable performance in speech recognition tasks . Inspired by this, we propose a VAD method to further improve VAD performance under adverse noise conditions with low signal-to-noise ratio (SNR). Specifically, we replace the temporal attention module in STAM [1] with a parallel structure, and introduce adaptive mask convolution to more finely capture vocal characteristics in acoustic signals. With the parallel architecture tailored for VAD tasks, Our SAPVAD method comprehensively captures acoustic environmental and speech features, enabling accurate speech detection in noisy environments.
What problem does this paper attempt to address?