An Ensemble SVM-based Approach for Voice Activity Detection

Jayanta Dey,Md Sanzid Bin Hossain,Mohammad Ariful Haque
DOI: https://doi.org/10.48550/arXiv.1902.01544
2019-02-05
Abstract:Voice activity detection (VAD), used as the front end of speech enhancement, speech and speaker recognition algorithms, determines the overall accuracy and efficiency of the algorithms. Therefore, a VAD with low complexity and high accuracy is highly desirable for speech processing applications. In this paper, we propose a novel training method on large dataset for supervised learning-based VAD system using support vector machine (SVM). Despite of high classification accuracy of support vector machines (SVM), trivial SVM is not suitable for classification of large data sets needed for a good VAD system because of high training complexity. To overcome this problem, a novel ensemble-based approach using SVM has been proposed in this <a class="link-external link-http" href="http://paper.The" rel="external noopener nofollow">this http URL</a> performance of the proposed ensemble structure has been compared with a feedforward neural network (NN). Although NN performs better than single SVM-based VAD trained on a small portion of the training data, ensemble SVM gives accuracy comparable to neural network-based VAD. Ensemble SVM and NN give 88.74% and 86.28% accuracy respectively whereas the stand-alone SVM shows 57.05% accuracy on average on the test dataset.
Sound,Machine Learning,Audio and Speech Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to improve the detection accuracy while maintaining low complexity in Voice Activity Detection (VAD). Specifically, the paper proposes an ensemble learning method based on Support Vector Machine (SVM) for supervised learning VAD systems on large - scale datasets. Although the traditional SVM has high classification accuracy, it is not suitable for direct application to large - scale datasets due to its high training complexity. Therefore, the paper proposes a new ensemble SVM method to overcome this limitation in order to achieve efficient and accurate VAD. The main contributions of the paper include: 1. **Proposing a new ensemble SVM method**: By splitting a large - scale dataset into multiple small datasets, independently training SVMs on each small dataset, and finally using the predicted probabilities of these SVMs as features input into the final SVM classifier, the overall performance of the system is improved. 2. **Comparison with Neural Network (NN)**: The paper compares the proposed ensemble SVM method with the traditional feed - forward neural network. The results show that the accuracy rate of the ensemble SVM on the test dataset reaches 88.74%, while the accuracy rates of the individual SVM and neural network are 57.05% and 86.28% respectively. 3. **Improvement in time efficiency**: Compared with other methods in the literature, this method only uses MFCC features, which reduces the time for feature extraction and improves the real - time performance of the system. Through these improvements, the paper aims to provide a more efficient and accurate VAD solution suitable for various speech processing applications.