Abstract:Deep learning and advancements in contactless sensors have significantly enhanced our ability to understand complex human activities in healthcare settings. In particular, deep learning models utilizing computer vision have been developed to enable detailed analysis of human gesture recognition, especially repetitive gestures which are commonly observed behaviors in children with autism. This research work aims to identify repetitive behaviors indicative of autism by analyzing videos captured in natural settings as children engage in daily activities. The focus is on accurately categorizing real-time repetitive gestures such as spinning, head banging, and arm flapping. To this end, we utilize the publicly accessible Self-Stimulatory Behavior Dataset (SSBD) to classify these stereotypical movements. A key component of the proposed methodology is the use of \textbf{VideoMAE}, a model designed to improve both spatial and temporal analysis of video data through a masking and reconstruction mechanism. This model significantly outperformed traditional methods, achieving an accuracy of 97.7\%, a 14.7\% improvement over the previous state-of-the-art.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to identify and classify the repetitive behaviors (such as spinning, head - banging, and arm - waving, etc.) of autistic children by analyzing videos in the natural environment. Specifically, the research aims to: 1. **Accurately identify and classify repetitive behaviors**: Use deep - learning models and computer - vision techniques to perform real - time identification and classification of typical repetitive behaviors exhibited by autistic children during daily activities. These behaviors are usually the hallmark features of autistic children and can assist in early diagnosis and intervention. 2. **Improve identification accuracy**: By introducing advanced deep - learning models (such as VideoMAE) and data - augmentation techniques, improve the performance of existing methods in feature extraction and behavior classification, thereby significantly enhancing the accuracy and robustness of identification. 3. **Address the problems of small samples and noise**: Since publicly available datasets (such as SSBD) have the problems of small data volume and high noise, the research improves the quality and diversity of the dataset through means such as video clipping, object detection (YOLOv7), and data augmentation to support more effective model training and evaluation. ### Main methods and techniques - **VideoMAE**: A video - analysis model based on masked auto - encoders, which improves spatio - temporal feature extraction through masking and reconstruction mechanisms, significantly enhancing the classification performance for complex repetitive behaviors. - **YOLOv7**: A deep - learning model for real - time object detection. By identifying and masking the largest object of interest in each frame, it reduces the interference of irrelevant details and increases the focus on repetitive behaviors. - **Video - enhancement techniques**: Through diverse processing of video data (such as cropping, flipping, color inversion, etc.), increase the diversity and richness of the dataset, especially for datasets like SSBD with limited data volume and high noise. ### Significance of the research This research not only provides new technical means for the behavior analysis of autistic children but also makes remote medical treatment and family support possible. By automatically identifying and classifying the repetitive behaviors of autistic children, doctors can make diagnoses earlier and provide personalized treatment plans for patients. In addition, this system can play a role in regions with limited resources, helping families that cannot access advanced diagnostic techniques. ### Key achievements - **High - precision classification**: On the SSBD dataset, the VideoMAE model has achieved an accuracy rate of 97.7%, which is 14.7% higher than previous methods. - **High robustness**: Through the application of YOLOv7 and video - enhancement techniques, the model can work stably in complex and noisy environments, improving its reliability in practical applications. In conclusion, this research has successfully addressed multiple challenges in the identification of repetitive behaviors of autistic children by combining advanced deep - learning techniques and computer - vision methods, providing strong support for the early diagnosis and intervention of autism.

Advanced Gesture Recognition in Autism: Integrating YOLOv7, Video Augmentation and VideoMAE for Video Analysis

Classification of Abnormal Hand Movement for Aiding in Autism Detection: Machine Learning Study

Vision-based activity recognition in children with autism-related behaviors

Computer Vision-Based Assessment of Autistic Children: Analyzing Interactions, Emotions, Human Pose, and Life Skills

Human Gesture and Gait Analysis for Autism Detection

Video-Based Autism Detection with Deep Learning

AGGRESSIVE ACTION IDENTIFICATION IN AUTISM SPECTRUM DISORDER USING VIDEO ANALYSIS

A Novel Dataset for Video-Based Autism Classification Leveraging Extra-Stimulatory Behavior

Gesture Classification in Electromyography Signals for Real-Time Prosthetic Hand Control Using a Convolutional Neural Network-Enhanced Channel Attention Model

Introducing SSBD+ Dataset with a Convolutional Pipeline for detecting Self-Stimulatory Behaviours in Children using raw videos

Single Shot Detector CNN and Deep Dilated Masks for Vision-Based Hand Gesture Recognition From Video Sequences

Guided Weak Supervision for Action Recognition with Scarce Data to Assess Skills of Children with Autism

Towards Child-Inclusive Clinical Video Understanding for Autism Spectrum Disorder

Towards Automatic Screening of Typical and Atypical Behaviors in Children With Autism

Deep learning for automatic stereotypical motor movement detection using wearable sensors in autism spectrum disorders

Social Visual Behavior Analytics for Autism Therapy of Children Based on Automated Mutual Gaze Detection

Hear Me, See Me, Understand Me: Audio-Visual Autism Behavior Recognition

MMASD+: A Novel Dataset for Privacy-Preserving Behavior Analysis of Children with Autism Spectrum Disorder

Ensemble Modeling of Multiple Physical Indicators to Dynamically Phenotype Autism Spectrum Disorder

Enhancing early autism diagnosis through machine learning: Exploring raw motion data for classification

Computer vision tools for the non-invasive assessment of autism-related behavioral markers