Abstract:With the recent development of speech-enabled interactive systems using artificial agents, there has been substantial interest in the analysis and classification of voice disorders to provide more inclusive systems for people living with specific speech and language impairments. In this paper, a two-stage framework is proposed to perform an accurate classification of diverse voice pathologies. The first stage consists of speech enhancement processing based on the original premise, which considers impaired voice as a noisy signal. To put this hypothesis into practice, the noise lestral harmonic-to-noise ratio (CHNR). The second stage consists of a convolutional neural network with long short-term memory (CNN-LSTM) architecture designed to learn complex features from spectrograms of the first-stage enhanced signals. A new sinusoidal rectified unit (SinRU) is proposed to be used as an activation function by the CNN-LSTM network. The experiments are carried out by using two subsets of the Saarbruecken voice database (SVD) with different etiologies covering eight pathologies. The first subset contains voice recordings of patients with vocal cordectomy, psychogenic dysphonia, pachydermia laryngis and frontolateral partial laryngectomy, and the second subset contains voice recordings of patients with vocal fold polyp, chronic laryngitis, functional dysphonia, and vocal cord paresis. Dysarthria severity levels identification in Nemours and Torgo databases is also carried out. The experimental results showed that using the minimum mean square error (MMSE)-based signal enhancer prior to the CNN-LSTM network using SinRU, led to a significant improvement in the automatic classification of the investigated voice disorders and dysarhtria severity levels. These findings support the hypothesis that using an appropriate speech enhancement preprocessing has positive effects on the accuracy of the automatic classification of voice pathologies thanks to the reduction of the intrinsic noise induced by the voice impairment.

Improving Pathological Voice Detection: A Weakly Supervised Learning Method

Pvd: A New Pathological Voice Dataset For Intra-Speaker Recognition Research Interest

Voice Pathology Detection and Classification Using Convolutional Neural Network Model

A Voice Disease Detection Method Based on MFCCs and Shallow CNN

Mapping Rugged Terrain for a Walking Robot

Pathological voice detection using optimized deep residual neural network and explainable artificial intelligence

Multifeature Fusion Method with Metaheuristic Optimization for Automated Voice Pathology Detection

A hybrid model for pathological voice recognition of post-stroke dysarthria by using 1DCNN and double-LSTM networks

Selfsupervised learning for pathological speech detection

Diagnosis of pathological speech with streamlined features for long short-term memory learning

Voice disorder classification using speech enhancement and deep learning models

Reproducible Machine Learning-based Voice Pathology Detection: Introducing the Pitch Difference Feature

Attentive-based Multi-level Feature Fusion for Voice Disorder Diagnosis

Voice Disorder Detection Using Long Short Term Memory (LSTM) Model

Supervised Classifiers for Audio Impairments with Noisy Labels

Investigating Self-supervised Pretraining Frameworks for Pathological Speech Recognition

Voice disorder classification using convolutional neural network based on deep transfer learning

Combined Generative Adversarial Network and Fuzzy C-Means Clustering for Multi-Class Voice Disorder Detection with an Imbalanced Dataset

Convolutional Neural Network Classifies Pathological Voice Change in Laryngeal Cancer with High Accuracy

Quantitative analysis of automatic voice disorder detection studies for hybrid feature and classifier selection

Multi-class Detection of Pathological Speech with Latent Features: How does it perform on unseen data?