Advanced Gesture Recognition in Autism: Integrating YOLOv7, Video Augmentation and VideoMAE for Video Analysis

Amit Kumar Singh,Trapti Shrivastava,Vrijendra Singh
2024-10-12
Abstract:Deep learning and advancements in contactless sensors have significantly enhanced our ability to understand complex human activities in healthcare settings. In particular, deep learning models utilizing computer vision have been developed to enable detailed analysis of human gesture recognition, especially repetitive gestures which are commonly observed behaviors in children with autism. This research work aims to identify repetitive behaviors indicative of autism by analyzing videos captured in natural settings as children engage in daily activities. The focus is on accurately categorizing real-time repetitive gestures such as spinning, head banging, and arm flapping. To this end, we utilize the publicly accessible Self-Stimulatory Behavior Dataset (SSBD) to classify these stereotypical movements. A key component of the proposed methodology is the use of \textbf{VideoMAE}, a model designed to improve both spatial and temporal analysis of video data through a masking and reconstruction mechanism. This model significantly outperformed traditional methods, achieving an accuracy of 97.7\%, a 14.7\% improvement over the previous state-of-the-art.
Computer Vision and Pattern Recognition,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to identify and classify the repetitive behaviors (such as spinning, head - banging, and arm - waving, etc.) of autistic children by analyzing videos in the natural environment. Specifically, the research aims to: 1. **Accurately identify and classify repetitive behaviors**: Use deep - learning models and computer - vision techniques to perform real - time identification and classification of typical repetitive behaviors exhibited by autistic children during daily activities. These behaviors are usually the hallmark features of autistic children and can assist in early diagnosis and intervention. 2. **Improve identification accuracy**: By introducing advanced deep - learning models (such as VideoMAE) and data - augmentation techniques, improve the performance of existing methods in feature extraction and behavior classification, thereby significantly enhancing the accuracy and robustness of identification. 3. **Address the problems of small samples and noise**: Since publicly available datasets (such as SSBD) have the problems of small data volume and high noise, the research improves the quality and diversity of the dataset through means such as video clipping, object detection (YOLOv7), and data augmentation to support more effective model training and evaluation. ### Main methods and techniques - **VideoMAE**: A video - analysis model based on masked auto - encoders, which improves spatio - temporal feature extraction through masking and reconstruction mechanisms, significantly enhancing the classification performance for complex repetitive behaviors. - **YOLOv7**: A deep - learning model for real - time object detection. By identifying and masking the largest object of interest in each frame, it reduces the interference of irrelevant details and increases the focus on repetitive behaviors. - **Video - enhancement techniques**: Through diverse processing of video data (such as cropping, flipping, color inversion, etc.), increase the diversity and richness of the dataset, especially for datasets like SSBD with limited data volume and high noise. ### Significance of the research This research not only provides new technical means for the behavior analysis of autistic children but also makes remote medical treatment and family support possible. By automatically identifying and classifying the repetitive behaviors of autistic children, doctors can make diagnoses earlier and provide personalized treatment plans for patients. In addition, this system can play a role in regions with limited resources, helping families that cannot access advanced diagnostic techniques. ### Key achievements - **High - precision classification**: On the SSBD dataset, the VideoMAE model has achieved an accuracy rate of 97.7%, which is 14.7% higher than previous methods. - **High robustness**: Through the application of YOLOv7 and video - enhancement techniques, the model can work stably in complex and noisy environments, improving its reliability in practical applications. In conclusion, this research has successfully addressed multiple challenges in the identification of repetitive behaviors of autistic children by combining advanced deep - learning techniques and computer - vision methods, providing strong support for the early diagnosis and intervention of autism.