Automatic Screening for Children with Speech Disorder using Automatic Speech Recognition: Opportunities and Challenges

Dancheng Liu,Jason Yang,Ishan Albrecht-Buehler,Helen Qin,Sophie Li,Yuting Hu,Amir Nassereldine,Jinjun Xiong
2024-10-08
Abstract:Speech is a fundamental aspect of human life, crucial not only for communication but also for cognitive, social, and academic development. Children with speech disorders (SD) face significant challenges that, if unaddressed, can result in lasting negative impacts. Traditionally, speech and language assessments (SLA) have been conducted by skilled speech-language pathologists (SLPs), but there is a growing need for efficient and scalable SLA methods powered by artificial intelligence. This position paper presents a survey of existing techniques suitable for automating SLA pipelines, with an emphasis on adapting automatic speech recognition (ASR) models for children's speech, an overview of current SLAs and their automated counterparts to demonstrate the feasibility of AI-enhanced SLA pipelines, and a discussion of practical considerations, including accessibility and privacy concerns, associated with the deployment of AI-powered SLAs.
Audio and Speech Processing,Computation and Language,Quantitative Methods
What problem does this paper attempt to address?
This paper attempts to address the issues present in the screening and assessment process of children's speech disorders (SD). Specifically: 1. **Limitations of Traditional Assessment Methods**: - Traditional speech and language assessment (SLA) is primarily conducted by professional speech-language pathologists (SLPs). These methods are time-consuming and resource-intensive, resulting in many children in need of services not being assessed and treated in a timely manner. - Each minute of speech sample requires an experienced SLP to spend 7 to 8 minutes to convert it into a systematic language transcription (SALT) format, which greatly limits the efficiency of the assessment. 2. **Need for Automated Assessment**: - With the development of deep learning, researchers have begun to explore how to utilize artificial intelligence technology (especially automatic speech recognition, ASR) to automate the SLA process to improve the efficiency and scalability of assessments. - Automated SLA can significantly reduce the time required for manual transcription and annotation, alleviating the workload of SLPs, allowing them to focus more on treatment. 3. **Adaptability Issues of Existing Technologies**: - Existing ASR models are primarily designed for adult speech and perform poorly in recognizing children's speech. Children's speech characteristics differ from adults, including rapid pitch changes, unique pronunciation patterns, and immature vocabulary development, which makes current ASR models perform poorly when processing children's speech. - The paper explores how to improve ASR models through fine-tuning and other technical means to better adapt to children's speech. 4. **Privacy and Usability Issues**: - Children's speech data is highly sensitive and requires a high level of privacy protection. Therefore, deploying ASR models on edge devices becomes a feasible method to ensure data security and privacy. - Additionally, the paper discusses the challenges of deploying automated SLA in resource-constrained environments, such as memory limitations, trade-offs between model quantization and performance, computational overhead, and latency. 5. **Fairness and Explainability**: - Automated SLA frameworks need to have a high degree of explainability, especially in the sensitive field of children's speech disorder assessment. Incorrect screening results (such as false negatives) can have serious consequences. - The paper emphasizes that in designing automated SLA systems, guidance from SLPs should be followed, and validated testing methods should be used to ensure the accuracy and reliability of the results. In summary, this paper aims to develop an efficient, accurate, scalable, and privacy-protecting automated system for screening and assessing children's speech disorders by combining deep learning and automatic speech recognition technologies. This system is intended to address the shortcomings of traditional methods and improve the efficiency of early detection and intervention for children's speech disorders.