Abstract:Advances in machine learning and contactless sensors have enabled the understanding complex human behaviors in a healthcare setting. In particular, several deep learning systems have been introduced to enable comprehensive analysis of neuro-developmental conditions such as Autism Spectrum Disorder (ASD). This condition affects children from their early developmental stages onwards, and diagnosis relies entirely on observing the child's behavior and detecting behavioral cues. However, the diagnosis process is time-consuming as it requires long-term behavior observation, and the scarce availability of specialists. We demonstrate the effect of a region-based computer vision system to help clinicians and parents analyze a child's behavior. For this purpose, we adopt and enhance a dataset for analyzing autism-related actions using videos of children captured in uncontrolled environments ( e.g. videos collected with consumer-grade cameras, in varied environments). The data is pre-processed by detecting the target child in the video to reduce the impact of background noise. Motivated by the effectiveness of temporal convolutional models, we propose both light-weight and conventional models capable of extracting action features from video frames and classifying autism-related behaviors by analyzing the relationships between frames in a video. By extensively evaluating feature extraction and learning strategies, we demonstrate that the highest performance is attained through the use of an Inflated 3D Convnet and Multi-Stage Temporal Convolutional Network. Our model achieved a Weighted F1-score of 0.83 for the classification of the three autism-related actions. We also propose a light-weight solution by employing the ESNet backbone with the same action recognition model, achieving a competitive 0.71 Weighted F1-score, and enabling potential deployment on embedded systems. Experimental results demonstrate the ability of our proposed models to recognize autism-related actions from videos captured in an uncontrolled environment, and thus can assist clinicians in analyzing ASD.

Facial Instance Learning for Video-based ASD Diagnosis

ASDFace: Face-based Autism Diagnosis Via Heterogeneous Domain Adaptation

ASDPred: an End-to-End Autism Screening Framework Using Few-Shot Learning

Discriminative Few Shot Learning of Facial Dynamics in Interview Videos for Autism Trait Classification

Autism Screening Using Deep Embedding Representation

A deep learning predictive classifier for autism screening and diagnosis

Identifying Children with Autism Spectrum Disorder Via Transformer-Based Representation Learning from Dynamic Facial Cues

Deep Learning Approach for Screening Autism Spectrum Disorder in Children with Facial Images and Analysis of Ethnoracial Factors in Model Development and Application

Joint Structured Sparsity Regularized Multiview Dimension Reduction for Video-Based Facial Expression Recognition.

TimeConvNets: A Deep Time Windowed Convolution Neural Network Design for Real-time Video Facial Expression Recognition

ASD-GResTM: Deep Learning Framework for ASD classification using Gramian Angular Field

Developing a New Autism Diagnosis Process Based on a Hybrid Deep Learning Architecture Through Analyzing Home Videos

Video-Based Autism Detection with Deep Learning

AutYOLO-ATT: an attention-based YOLOv8 algorithm for early autism diagnosis through facial expression recognition

Vision-based activity recognition in children with autism-related behaviors

A Novel Dataset for Video-Based Autism Classification Leveraging Extra-Stimulatory Behavior

Classification and Detection of Autism Spectrum Disorder Based on Deep Learning Algorithms

Facial Features Detection System To Identify Children With Autism Spectrum Disorder: Deep Learning Models

DeepASD: Facial Image Analysis for Autism Spectrum Diagnosis via Explainable Artificial Intelligence

AI-Powered Human-Computer Interaction Assisting Early Identification of Emotional and Facial Symptoms of Autism Spectrum Disorder in Children: “A Deep Learning-Based Enhanced Facial Feature Recognition System”

ARL-IL CNN for Automatic Facial Expression Recognition of Infants under 24 Months of Age