Voice activity detection using convolutive non-negative sparse coding

Peng Teng,Yunde Jia
DOI: https://doi.org/10.1109/ICASSP.2013.6639095
2013-01-01
ICASSP
Abstract:This paper presents a voice activity detection (VAD) approach using convolutive non-negative sparse coding (CNSC) to improve the detection performance in low signal-to-noise (SNR) conditions. Our idea is to use noise-robust feature for speech signal detection while noise is reduced away. We first use magnitude spectrum as the non-negative and additive low-level representation of audio signals, and learn a speech dictionary from clean speech as well as a noise dictionary from noise samples. Then, the two dictionaries are concatenated to form a global dictionary, and an audio signal is decomposed into coefficient vectors using CNSC on the global dictionary. Only coefficients corresponding to the bases from the speech dictionary are taken as the features for the signal. At last, the activity labels is given by decoding a conditional random field (CRF) which is constructed to model the context of an audio signal for VAD. Experiments demonstrate that our VAD approach has an excellent performance in low SNR conditions.
What problem does this paper attempt to address?