An Ensemble SVM-based Approach for Voice Activity Detection

Jayanta Dey,Md Sanzid Bin Hossain,Mohammad Ariful Haque

DOI: https://doi.org/10.48550/arXiv.1902.01544

2019-02-05

Abstract:Voice activity detection (VAD), used as the front end of speech enhancement, speech and speaker recognition algorithms, determines the overall accuracy and efficiency of the algorithms. Therefore, a VAD with low complexity and high accuracy is highly desirable for speech processing applications. In this paper, we propose a novel training method on large dataset for supervised learning-based VAD system using support vector machine (SVM). Despite of high classification accuracy of support vector machines (SVM), trivial SVM is not suitable for classification of large data sets needed for a good VAD system because of high training complexity. To overcome this problem, a novel ensemble-based approach using SVM has been proposed in this <a class="link-external link-http" href="http://paper.The" rel="external noopener nofollow">this http URL</a> performance of the proposed ensemble structure has been compared with a feedforward neural network (NN). Although NN performs better than single SVM-based VAD trained on a small portion of the training data, ensemble SVM gives accuracy comparable to neural network-based VAD. Ensemble SVM and NN give 88.74% and 86.28% accuracy respectively whereas the stand-alone SVM shows 57.05% accuracy on average on the test dataset.

Sound,Machine Learning,Audio and Speech Processing

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to improve the detection accuracy while maintaining low complexity in Voice Activity Detection (VAD). Specifically, the paper proposes an ensemble learning method based on Support Vector Machine (SVM) for supervised learning VAD systems on large - scale datasets. Although the traditional SVM has high classification accuracy, it is not suitable for direct application to large - scale datasets due to its high training complexity. Therefore, the paper proposes a new ensemble SVM method to overcome this limitation in order to achieve efficient and accurate VAD. The main contributions of the paper include: 1. **Proposing a new ensemble SVM method**: By splitting a large - scale dataset into multiple small datasets, independently training SVMs on each small dataset, and finally using the predicted probabilities of these SVMs as features input into the final SVM classifier, the overall performance of the system is improved. 2. **Comparison with Neural Network (NN)**: The paper compares the proposed ensemble SVM method with the traditional feed - forward neural network. The results show that the accuracy rate of the ensemble SVM on the test dataset reaches 88.74%, while the accuracy rates of the individual SVM and neural network are 57.05% and 86.28% respectively. 3. **Improvement in time efficiency**: Compared with other methods in the literature, this method only uses MFCC features, which reduces the time for feature extraction and improves the real - time performance of the system. Through these improvements, the paper aims to provide a more efficient and accurate VAD solution suitable for various speech processing applications.

An Ensemble SVM-based Approach for Voice Activity Detection

Applying Support Vector Machines to Voice Activity Detection

A Novel and Efficient Voice Activity Detector Using Shape Features of Speech Wave.

Multi-task Joint-Learning for Robust Voice Activity Detection

Efficient voice activity detection algorithm based on sub-band temporal envelope and sub-band long-term signal variability

DNN-based Voice Activity Detection for Speaker Recognition

A Universal VAD Based on Jointly Trained Deep Neural Networks.

Efficient Multiple Kernel Support Vector Machine Based Voice Activity Detection

A Voice Activity Detection Method Based on DWT-MVNPDF

A Robust and Lightweight Voice Activity Detection Algorithm for Speech Enhancement at Low Signal-to-noise Ratio

Voice activity detection in the wild: A data-driven approach using teacher-student training

Bimodal Recurrent Neural Network for Audiovisual Voice Activity Detection

An energy-efficient voice activity detector using reconfigurable Gaussian base normalization deep neural network

End-to-End Speaker-Dependent Voice Activity Detection

sVAD: A Robust, Low-Power, and Light-Weight Voice Activity Detection with Spiking Neural Networks

Boosting Contextual Information for Deep Neural Network Based Voice Activity Detection

A machine learning approach for vocal fold segmentation and disorder classification based on ensemble method

Voice Activity Detection Based on Wavelet Multiresolution Spectrum

Audio-visual voice activity detection using diffusion maps

Self-Adaptive Soft Voice Activity Detection using Deep Neural Networks for Robust Speaker Verification

SVVAD: Personal Voice Activity Detection for Speaker Verification