Variable STFT Layered CNN Model for Automated Dysarthria Detection and Severity Assessment Using Raw Speech

Kodali Radha,Mohan Bansal,Venkata Rao Dulipalla
DOI: https://doi.org/10.1007/s00034-024-02611-7
IF: 2.311
2024-02-23
Circuits Systems and Signal Processing
Abstract:This paper presents a novel approach for automated dysarthria detection and severity assessment using a variable short-time Fourier transform layered convolutional neural networks (CNN) model. Dysarthria is a speech disorder characterized by difficulties in articulation, resulting in unclear speech. The model is evaluated on two datasets, TORGO and UA-Speech, consisting of individuals with dysarthria and healthy controls. Various variations of the CNN's first layer, including spectrogram, log spectrogram, and pre-emphasis filtering (PEF) with and without learnables, are investigated. Notably, the PEF with 5 learnables achieves the highest accuracy in detecting dysarthria and assessing its severity. The study highlights the significance of dataset size, with UA-Speech dataset showing superior performance due to its larger size, enabling better capture of dysarthria severity variations. This research contributes to the advancement of objective dysarthria assessment, aiding in early diagnosis and personalized treatment for individuals with speech disorders.
engineering, electrical & electronic
What problem does this paper attempt to address?