Abstract:Summary Objectives The study aims to classify normal and pathological voices by leveraging the wav2vec 2.0 model as a feature extraction method in conjunction with machine learning classifiers. Methods Voice recordings were sourced from the publicly accessible VOICED database. The data underwent preprocessing, including normalization and data augmentation, before being input into the wav2vec 2.0 model for feature extraction. The extracted features were then used to train four machine learning models—Support Vector Machine (SVM), K-Nearest Neighbors, Decision Tree (DT), and Random Forest (RF)—which were evaluated using Stratified K-Fold cross-validation. Performance metrics such as accuracy, precision, recall, F1-score, macro average, micro average, receiver-operating characteristic (ROC) curve, and confusion matrix were utilized to assess model performance. Results The RF model achieved the highest accuracy (0.98 ± 0.02), alongside strong recall (0.97 ± 0.04), F1-score (0.95 ± 0.05), and consistently high area under the curve (AUC) values approaching 1.00, indicating superior classification performance. The DT model also demonstrated excellent performance, particularly in precision (0.97 ± 0.02) and F1-score (0.96 ± 0.02), with AUC values ranging from 0.86 to 1.00. Macro-averaged and micro-averaged analyses showed that the DT model provided the most balanced and consistent performance across all classes, while RF model exhibited robust performance across multiple metrics. Additionally, data augmentation significantly enhanced the performance of all models, with marked improvements in accuracy, recall, F1-score, and AUC values, especially notable in the RF and DT models. ROC curve analysis further confirms the consistency and reliability of the RF and SVM models across different folds, while confusion matrix analysis revealed that RF and SVM models had the fewest misclassifications in distinguishing "Normal" and "Pathological" samples. Consequently, RF and DT models emerged as the most robust performers, making them particularly well-suited for the voice classification task in this study. Conclusions The method of wav2vec 2.0 combining machine learning models proved highly effective in classifying normal and pathological voices, achieving exceptional accuracy and robustness across various machine evaluation metrics.

Pathological voice detection using optimized deep residual neural network and explainable artificial intelligence

Mapping Rugged Terrain for a Walking Robot

Voice disorder classification using speech enhancement and deep learning models

Diagnosis of pathological speech with streamlined features for long short-term memory learning

Voice Pathology Detection and Classification Using Convolutional Neural Network Model

[Determination of circulatory outputs and volumes by means of injections of indicator: validation on models].

A Novel Artificial-Intelligence-Based Approach for Classification of Parkinson’s Disease Using Complex and Large Vocal Features

Voice disorder detection using machine learning algorithms: An application in speech and language pathology

Multifeature Fusion Method with Metaheuristic Optimization for Automated Voice Pathology Detection

Voice disorder classification using convolutional neural network based on deep transfer learning

A Voice Disease Detection Method Based on MFCCs and Shallow CNN

Voice Disorder Analysis: a Transformer-based Approach

Improving Pathological Voice Detection: A Weakly Supervised Learning Method

Vocal Feature Extraction-Based Artificial Intelligent Model for Parkinson's Disease Detection

Leveraging Deep Learning for Fine-Grained Categorization of Parkinson's Disease Progression Levels through Analysis of Vocal Acoustic Patterns

Voice Disorder Detection Using Long Short Term Memory (LSTM) Model

Deep Learning Application for Vocal Fold Disease Prediction Through Voice Recognition: A Preliminary Development Study (Preprint)

Developing vocal system impaired patient-aimed voice quality assessment approach using ASR representation-included multiple features

Voice Disorder Classification Using Wav2vec 2.0 Feature Extraction

Evaluating the Diagnostic Potential of Connected Speech for Benign Laryngeal Disease Using Deep Learning Analysis

The cause of cirrhosis.