Abstract:Summary Objectives The study aims to classify normal and pathological voices by leveraging the wav2vec 2.0 model as a feature extraction method in conjunction with machine learning classifiers. Methods Voice recordings were sourced from the publicly accessible VOICED database. The data underwent preprocessing, including normalization and data augmentation, before being input into the wav2vec 2.0 model for feature extraction. The extracted features were then used to train four machine learning models—Support Vector Machine (SVM), K-Nearest Neighbors, Decision Tree (DT), and Random Forest (RF)—which were evaluated using Stratified K-Fold cross-validation. Performance metrics such as accuracy, precision, recall, F1-score, macro average, micro average, receiver-operating characteristic (ROC) curve, and confusion matrix were utilized to assess model performance. Results The RF model achieved the highest accuracy (0.98 ± 0.02), alongside strong recall (0.97 ± 0.04), F1-score (0.95 ± 0.05), and consistently high area under the curve (AUC) values approaching 1.00, indicating superior classification performance. The DT model also demonstrated excellent performance, particularly in precision (0.97 ± 0.02) and F1-score (0.96 ± 0.02), with AUC values ranging from 0.86 to 1.00. Macro-averaged and micro-averaged analyses showed that the DT model provided the most balanced and consistent performance across all classes, while RF model exhibited robust performance across multiple metrics. Additionally, data augmentation significantly enhanced the performance of all models, with marked improvements in accuracy, recall, F1-score, and AUC values, especially notable in the RF and DT models. ROC curve analysis further confirms the consistency and reliability of the RF and SVM models across different folds, while confusion matrix analysis revealed that RF and SVM models had the fewest misclassifications in distinguishing "Normal" and "Pathological" samples. Consequently, RF and DT models emerged as the most robust performers, making them particularly well-suited for the voice classification task in this study. Conclusions The method of wav2vec 2.0 combining machine learning models proved highly effective in classifying normal and pathological voices, achieving exceptional accuracy and robustness across various machine evaluation metrics.

Automatic classification of neurological voice disorders using wavelet scattering features

Classification of phonation types in singing voice using wavelet scattering network-based features

Voice disorder classification using speech enhancement and deep learning models

Exclusion in liver by polymerase chain reaction of hepatitis B and C viruses in acute liver failure attributed to sporadic non-A, non-B hepatitis.

Diagnosis of pathological speech with streamlined features for long short-term memory learning

Voice Disorder Classification Using Wav2vec 2.0 Feature Extraction

Automated Dysarthria Severity Classification: A Study on Acoustic Features and Deep Learning Techniques

Voice Analysis for Neurological Disorder Recognition-A Systematic Review and Perspective on Emerging Trends

Toward Real-World Voice Disorder Classification

Automatic dysarthria detection and severity level assessment using CWT-layered CNN model

Pre-trained models for detection and severity level classification of dysarthria from speech

[Determination of circulatory outputs and volumes by means of injections of indicator: validation on models].

Voice Disorder Detection Using Long Short Term Memory (LSTM) Model

A Novel Artificial-Intelligence-Based Approach for Classification of Parkinson’s Disease Using Complex and Large Vocal Features

Wavelet transforms for feature engineering in EEG data processing: An application on Schizophrenia

Classification of Wideband Tympanometry by Deep Transfer Learning With Data Augmentation for Automatic Diagnosis of Otosclerosis

Detection and differentiation of ataxic and hypokinetic dysarthria in cerebellar ataxia and parkinsonian disorders via wave splitting and integrating neural networks

Voice disorder classification using convolutional neural network based on deep transfer learning

Wav2vec-based Detection and Severity Level Classification of Dysarthria from Speech

Machine-learning applied to classify flow-induced sound parameters from simulated human voice

Automatic Assessment of Dysarthria Using Audio-visual Vowel Graph Attention Network