Abstract:The performance of traditional voice activity detectors significantly deteriorates in the presence of highly nonstationary noise and transient interferences. One solution is to incorporate a video signal which is invariant to the acoustic environment. Although several voice activity detectors based on the video signal were recently presented, merely few detectors which are based on both the audio and the video signals exist in the literature to date. In this paper, we present an audio-visual voice activity detector and show that the incorporation of both audio and video signals is highly beneficial for voice activity detection. The algorithm is based on a supervised learning procedure, and a labeled training data set is considered. The algorithm comprises a feature extraction procedure, where the features are designed to separate speech from nonspeech frames. Diffusion maps is applied separately and similarly to the features of each modality and builds a low dimensional representation. Using the new representation, we propose a measure for voice activity which is based on a supervised learning procedure and the variability between adjacent frames in time. The measures of the two modalities are merged to provide voice activity detection based on both the audio and the video signals. Experimental results demonstrate the improved performance of the proposed algorithm compared to state-of-the-art detectors.

Voice Activity Detection Using Wavelets Multiresolution Spectrum and Short-time Adaptive Audio Mixing Algorithm

Voice Activity Detection Based on Wavelet Multiresolution Spectrum

Applying Support Vector Machines to Voice Activity Detection

A Novel and Efficient Voice Activity Detector Using Shape Features of Speech Wave.

An Algorithm of Voice Activity Detection Based on Noise Estimation

Efficient voice activity detection algorithm based on sub-band temporal envelope and sub-band long-term signal variability

An efficient voice activity detection algorithm by combining statistical model and energy detection

Improved Voice Activity Detection Based on Long-term Spectral Divergence and Pitch Ratio Features

An Effective Voice Activity Detection Algorithm in Mobile Communication Corrupted by Impulse Noise

Audio-visual voice activity detection using diffusion maps

Waveform-based Voice Activity Detection Exploiting Fully Convolutional networks with Multi-Branched Encoders

Robust Voice Activity Detection based on Pitch and Sub-band Energy

Speech enhancement aided end-to-end multi-task learning for voice activity detection

VAD Based on Speech Presence Probability

An Impulse Noise Robust Voice Activity Detection Algorithm Applied For Low Signal-To-Noise Ratio Digital Communication

Wavoice: an Mmwave-Assisted Noise-Resistant Speech Recognition System

Wavoice: an Mmwave-Assisted Noise-Resistant Speech Recognition System.

A Robust Algorithm of Double Talk Detection Based on Voice Activity Detection

Wavoice: A mmWave-assisted Noise-resistant Speech Recognition SystemJust Accepted

Multimodal Voice Activity Detection

Voice Activity Detection Based on Time-Delay Neural Networks