Wav2vec-based Detection and Severity Level Classification of Dysarthria from Speech

Farhad Javanmardi,Saska Tirronen,Manila Kodali,Sudarsana Reddy Kadiri,Paavo Alku
DOI: https://doi.org/10.1109/ICASSP49357.2023.10094857
2023-10-17
Abstract:Automatic detection and severity level classification of dysarthria directly from acoustic speech signals can be used as a tool in medical diagnosis. In this work, the pre-trained wav2vec 2.0 model is studied as a feature extractor to build detection and severity level classification systems for dysarthric speech. The experiments were carried out with the popularly used UA-speech database. In the detection experiments, the results revealed that the best performance was obtained using the embeddings from the first layer of the wav2vec model that yielded an absolute improvement of 1.23% in accuracy compared to the best performing baseline feature (spectrogram). In the studied severity level classification task, the results revealed that the embeddings from the final layer gave an absolute improvement of 10.62% in accuracy compared to the best baseline features (mel-frequency cepstral coefficients).
Audio and Speech Processing,Computation and Language,Machine Learning,Sound,Signal Processing
What problem does this paper attempt to address?
This paper aims to address the problem of automatic detection and severity classification of speech disorders (dysarthria). Specifically, the study utilizes the pre-trained wav2vec 2.0 model to extract features from speech signals in order to establish a detection system and a severity classification system for dysarthria. The experiments were conducted on the commonly used UA-speech database, and the results indicate: 1. **Detection Task**: In the detection task, the first layer embedding of the wav2vec model performed the best, with an accuracy improvement of 1.23% over the best baseline feature (spectrogram). 2. **Severity Classification Task**: In the severity classification task, the last layer embedding of the wav2vec model performed the best, with an accuracy improvement of 10.62% over the best baseline feature (Mel-frequency cepstral coefficients, MFCCs). Through these experimental results, the paper demonstrates the potential of the wav2vec model in handling tasks related to dysarthria and provides an effective auxiliary tool for clinical diagnosis.