Phase Continuity-Aware Self-Attentive Recurrent Network with Adaptive Feature Selection for Robust VAD

Minjie Tang,Hao Huang,Wenbo Zhang,Liang He
DOI: https://doi.org/10.1109/icassp48485.2024.10446084
2024-01-01
Abstract:Deep neural network (DNN) applications have significantly progressed in voice activity detection (VAD). Most current DNN-based VAD methods ignore the rich audio information in the phase domain. Therefore, applying this auxiliary information rationally and coping with low signal-to-noise ratio (SNR) background noise environments remains one of the challenges for VAD. To address this problem, we propose a VAD model robust to noise called phase continuity-aware self-attentive recurrent network (PC-ARN). For the input of PC-ARN, we draw inspiration from recent speech enhancement research by introducing phase-related features and further employing an adaptive feature selection module (AFSM) to combine magnitude features with it efficiently. The backbone network is an ARN module combining the attention mechanism and recurrent neural network (RNN), which can consider the relationship between local and global information to improve VAD performance competently. Experimental results show that our method has remarkable generalization ability and robustness compared to the traditional VAD techniques.
What problem does this paper attempt to address?