Abstract:Background and objective: Screening children for communicational disorders such as specific language impairment (SLI) is always challenging as it requires clinicians to follow a series of steps to evaluate the subjects. Artificial intelligence and computer-aided diagnosis have supported health professionals in making swift and error-free decisions about the neurodevelopmental state of children vis-à-vis language comprehension and production. Past studies have claimed that typical developing (TD) and SLI children show distinct vocal characteristics that can serve as discriminating facets between them. The objective of this study is to group children in SLI or TD categories by processing their raw speech signals using two proposed approaches: a customized convolutional neural network (CNN) model and a hybrid deep-learning framework where CNN is combined with long-short-term-memory (LSTM). Method: We considered a publicly available speech database of SLI and typical children of Czech accents for this study. The convolution filters in both the proposed CNN and hybrid models (CNN-LSTM) estimated fuzzy-automated features from the speech utterance. We performed the experiments in five separate sessions. Data augmentations were performed in each of those sessions to enhance the training strength. Results: Our hybrid model exhibited a perfect 100% accuracy and F-measure for almost all the session-trials compared to CNN alone which achieved an average accuracy close to 90% and F-measure ≥ 92%. The models have further illustrated their robust classification essences by securing values of reliability indexes over 90%. Conclusion: The results confirm the effectiveness of proposed approaches for the detection of SLI in children using their raw speech signals. Both the models do not require any dedicated feature extraction unit for their operations. The models may also be suitable for screening SLI and other neurodevelopmental disorders in children of different linguistic accents.

A Text-Dependent End-to-End Speech Sound Disorder Detection and Diagnosis in Mandarin-Speaking Children

Deep-Learning-Based Automated Classification of Chinese Speech Sound Disorders

An Automated Assessment Tool for Child Speech Disorders.

Detection of Consonant Errors in Disordered Speech Based on Consonant-vowel Segment Embedding

Automated Classification of Phonetic Segments in Child Speech Using Raw Ultrasound Imaging

Audio Texture and Age-Wise Analysis of Disordered Speech in Children Having Specific Language Impairment

Text-Aware End-to-end Mispronunciation Detection and Diagnosis

Computer-assisted syllable analysis of continuous speech as a measure of child speech disordera)

Phonological Level wav2vec2-based Mispronunciation Detection and Diagnosis Method

DysDiTect: Dyslexia Identification Using CNN-Positional-LSTM-Attention Modeling with Chinese Dictation Task

Automatic Speech Recognition (ASR) for the Diagnosis of pronunciation of Speech Sound Disorders in Korean children

Syllable based DNN-HMM Cantonese Speech to Text System

SSDM 2.0: Time-Accurate Speech Rich Transcription with Non-Fluencies

One-dimensional convolutional neural network and hybrid deep-learning paradigm for classification of specific language impaired children using their speech

SSDM: Scalable Speech Dysfluency Modeling

Synthetic Speech Detection Based on Temporal Consistency and Distribution of Speaker Features

Enhancing Child Vocalization Classification with Phonetically-Tuned Embeddings for Assisting Autism Diagnosis

A Novel Application System of Assessing the Pronunciation Differences Between Chinese Children and Adults

Synthetic Speech Detection Based on the Temporal Consistency of Speaker Features

Automatic detection of Mild Cognitive Impairment using high-dimensional acoustic features in spontaneous speech