Abstract:Autism Spectrum Disorder (ASD) is a lifelong condition that significantly influencing an individual's communication abilities and their social interactions. Early diagnosis and intervention are critical due to the profound impact of ASD's characteristic behaviors on foundational developmental stages. However, limitations of standardized diagnostic tools necessitate the development of objective and precise diagnostic methodologies. This paper proposes an end-to-end framework for automatically predicting the social communication severity of children with ASD from raw speech data. This framework incorporates an automatic speech recognition model, fine-tuned with speech data from children with ASD, followed by the application of fine-tuned pre-trained language models to generate a final prediction score. Achieving a Pearson Correlation Coefficient of 0.6566 with human-rated scores, the proposed method showcases its potential as an accessible and objective tool for the assessment of ASD.

What problem does this paper attempt to address?

This paper attempts to address the issue of assessing the severity of social communication in children with Autism Spectrum Disorder (ASD). Specifically, the paper proposes an end-to-end framework that automatically predicts the severity score of social communication in ASD children using raw speech data. This framework combines Automatic Speech Recognition (ASR) models and Pre-trained Language Models (PLM), and fine-tunes these models to generate the final predicted scores. ### Background and Problem Autism Spectrum Disorder (ASD) is a lifelong condition that severely affects an individual's communication abilities and social interactions. Early diagnosis and intervention are crucial during the foundational developmental stages. However, existing standardized diagnostic tools have numerous limitations, such as a scarcity of professionals, subjective biases, and lengthy assessment processes. Therefore, developing objective and accurate diagnostic methods is particularly urgent. ### Solution The paper proposes an end-to-end framework aimed at achieving automatic assessment through the following steps: 1. **Automatic Speech Recognition (ASR) Model**: Select and fine-tune two multilingual ASR models (wav2vec2 and whisper) to adapt to the speech characteristics of ASD children and typically developing (TD) children. 2. **Pre-trained Language Model (PLM)**: Fine-tune three PLMs (KR-BERT, KLUE/roberta-base, and KR-ELECTRA-Discriminator) using three methods: traditional fine-tuning, manual prompting, and P-tuning. 3. **Ensemble Method**: Use seed ensemble techniques to aggregate the predictions of multiple fine-tuned models, enhancing the robustness and accuracy of the predictions. ### Experiments and Results - **Data Preparation**: Collected speech data from 168 ASD children and 40 TD children for fine-tuning the ASR models and PLMs. - **Experimental Setup**: Included full dataset settings and low-resource settings, using all available training data and 20% of the training data, respectively, for evaluation. - **Evaluation Metrics**: Used Pearson Correlation Coefficient (PCC) to measure the relationship between the model's predicted outputs and human-annotated scores. The experimental results show that the proposed framework performs excellently in predicting the severity of social communication in ASD children, especially in data-limited scenarios. Notably, in low-resource settings, certain combinations (such as the klue/roberta-base model with P-tuning) even outperformed human transcription. ### Discussion - **ASR vs. Human Transcription**: In low-resource settings, ASR transcription performed close to or even better than human transcription, demonstrating its potential in resource-limited situations. - **ASR Model Selection**: Although the whisper model had a lower error rate, the wav2vec2 model performed better in capturing ASD-related speech features. - **PLM and Tuning Methods**: The choice of different PLMs and tuning methods significantly impacted performance, with P-tuning showing outstanding results in certain cases. ### Conclusion The paper proposes an end-to-end framework that fine-tunes ASR models and PLMs to automatically predict the severity of social communication in ASD children from raw speech data. The experimental results indicate that this framework maintains high prediction accuracy even in data-limited scenarios, providing a new tool for early diagnosis and intervention of ASD. Future research will focus on improving the interpretability of the models to ensure their reliability and transparency in clinical applications.

Developing an End-to-End Framework for Predicting the Social Communication Severity Scores of Children with Autism Spectrum Disorder

ASDPred: an End-to-End Autism Screening Framework Using Few-Shot Learning

An Automated Assessment Framework for Atypical Prosody and Stereotyped Idiosyncratic Phrases Related to Autism Spectrum Disorder.

A deep learning predictive classifier for autism screening and diagnosis

Reliably quantifying the severity of social symptoms in children with autism using ASDSpeech

Automatically Predicting Perceived Conversation Quality in a Pediatric Sample Enriched for Autism

Speech Corpus for Korean Children with Autism Spectrum Disorder: Towards Automatic Assessment Systems

A Two-stage Multi-modal Affect Analysis Framework for Children with Autism Spectrum Disorder

Detecting Autism Spectrum Disorders with Machine Learning Models Using Speech Transcripts

Classifying Autism from Crowdsourced Semi-Structured Speech Recordings: A Machine Learning Approach

Correlation and Predictive Ability Between Sensory Characteristics and Social Interaction of Children in Autism Spectrum Disorder

Objective Measurement of Social Communication Behaviors in Children with Suspected ASD During the ADOS-2

Development and Validation of a Joint Attention-Based Deep Learning System for Detection and Symptom Severity Assessment of Autism Spectrum Disorder

Proposing a System Level Machine Learning Hybrid Architecture and Approach for a Comprehensive Autism Spectrum Disorder Diagnosis

An Integrated Statistical and Clinically Applicable Machine Learning Framework for the Detection of Autism Spectrum Disorder

Computational Interpersonal Communication Model for Screening Autistic Toddlers: A Case Study of Response-to-Name

Classifying Autism From Crowdsourced Semistructured Speech Recordings: Machine Learning Model Comparison Study

Exploring Speech Pattern Disorders in Autism using Machine Learning

Computer-Aided Autism Spectrum Disorder Diagnosis With Behavior Signal Processing

Developing a New Autism Diagnosis Process Based on a Hybrid Deep Learning Architecture Through Analyzing Home Videos

Prediction and Analysis of Autism Spectrum Disorder Using Machine Learning Techniques