A robust accent classification system based on variational mode decomposition
Darshana Subhash,Jyothish Lal G.,Premjith B.,Vinayakumar Ravi
DOI: https://doi.org/10.1016/j.engappai.2024.109512
IF: 8
2024-11-03
Engineering Applications of Artificial Intelligence
Abstract:State-of-the-art automatic speech recognition models often struggle to capture nuanced features inherent in accented speech, leading to sub-optimal performance in speaker recognition based on regional accents. Despite substantial progress in the field of automatic speech recognition, ensuring robustness to accents and generalization across dialects remains a persistent challenge, particularly in real-time settings. In response, this study introduces a novel approach leveraging Variational Mode Decomposition (VMD) to enhance accented speech signals, aiming to mitigate noise interference and improve generalization on unseen accented speech datasets. Our method employs decomposed modes of the VMD algorithm for signal reconstruction, followed by feature extraction using Mel-Frequency Cepstral Coefficients (MFCC). These features are subsequently classified using machine learning models such as 1D Convolutional Neural Network (1D-CNN), Support Vector Machine (SVM), Random Forest, and Decision Trees, as well as a deep learning model based on a 2D Convolutional Neural Network (2D-CNN). Experimental results demonstrate superior performance, with the SVM classifier achieving an accuracy of approximately 87.5% on a standard dataset and 99.3% on the AccentBase dataset. The 2D-CNN model further improves the results in multi-class accent classification tasks. This research contributes to advancing automatic speech recognition robustness and accent-inclusive speaker recognition, addressing critical challenges in real-world applications.
automation & control systems,computer science, artificial intelligence,engineering, electrical & electronic, multidisciplinary