A New Method for Predicting Severity Level of Dysarthric Speech Based on Joint Feature-Sample Selection Using Audio-Visual Data

Shangjun Lu,Xiaoxia Du,Juan Liu,Yu-Mei Zhang,Shaofeng Zhao,Rongfeng Su,Lan Wang,Nan Yan
DOI: https://doi.org/10.1109/ialp57159.2022.9961300
2022-01-01
Abstract:Automatic objective assessment of dysarthria is valuable and crucial. Most previous studies focus on using audio-only data, ignoring the complementary of other modal data. In addition, traditional methods ignore the relationship between the pre-defined features and different pronunciations, reducing the performance of the automatic assessment system. To address these issues, this paper proposes a joint feature-sample selection (JFSS) based dysarthria severity level regression model using audio-visual data. In the proposed framework, relevant pronunciation samples and features are simultaneously obtained and unreliable noisy samples are discarded by the JFSS method. On the Mandarin Subacute Stroke Dysarthria Multimodal (MSDM) Database, the proposed regression model outperformed several baseline models. By using acoustic-visual features, the root mean square error (RMSE) of 13.78 and fitting coefficient R-square of 0.77 computed between the automatically predicted and perceptual evaluation metrics (i.e. Frenchay Dysarthria Assessment) were obtained, which confirmed the capacity of the proposed JFSS-based regression method in predicting dysarthria severity level.
What problem does this paper attempt to address?