A Text-Dependent End-to-End Speech Sound Disorder Detection and Diagnosis in Mandarin-Speaking Children

Chen-Hsuan Hsiao,Shanq-Jang Ruan,Chih-Lun Chen,Ya-Wen Tu,Yu-Chin Chen,Griffani Megiyanto Rahmatullah
DOI: https://doi.org/10.1109/tim.2024.3438853
IF: 5.6
2024-08-21
IEEE Transactions on Instrumentation and Measurement
Abstract:Speech sound disorder (SSD) is a common communication disorder among children that can lead to difficulties in accurately articulating speech sounds. Automated detection of SSD can provide timely assessment and screening for a large population of children. This article presents a novel approach for automatically detecting and diagnosing SSD in Chinese-speaking children. The proposed approach utilizes a text-dependent end-to-end model incorporating linguistic and acoustic features. Connectionist temporal classification (CTC) is used as the loss function. We collected a corpus of speech samples from 100 children aged three to nine, focusing on the speech patterns and characteristics of early childhood. Reliable annotations were provided by multiple professional speech and language pathologists (SLPs) for model training and evaluation. Experimental results demonstrate that the proposed approach achieves superior performance compared with the baseline model convolutional neural network (CNN)–recurrent neural network (RNN)–CTC in phoneme recognition, mispronunciation detection and diagnosis (MDD), as well as the identification of common phonological processes. The overall average F-measure for recognizing common phonological processes reached 66.16%.
engineering, electrical & electronic,instruments & instrumentation
What problem does this paper attempt to address?