Novel multimodal contrast learning framework using zero-shot prediction for abnormal behavior recognition

Hai Chuan Liu,Anis Salwa Mohd Khairuddin,Joon Huang Chuah,Xian Min Zhao,Xiao Dan Wang,Li Ming Fang,Si Bo Kong
DOI: https://doi.org/10.1007/s10489-024-05994-x
IF: 5.3
2024-12-11
Applied Intelligence
Abstract:Human abnormal behavior detection is important to ensure public safety and prevent unwanted incidents. Currently, recognition systems for human abnormal behavior adopt neural network models and perform standard 1-of-N majority voting procedures. However, recognizing human abnormal behaviors can be challenging due to lengthy and numerous video datasets and the limitations of existing methods that rely on predefined categories and scenarios. This study proposed a novel method named Visual Text Contrastive Learning (VTCL) for identifying abnormal human behavior in campus settings. The proposed model emphasizes semantic information from automatically labeled properties text and videos of abnormal behaviors, moving beyond simple numerical representations. The proposed method integrates the cross and multi-frame methods within the visual branch to improve spatial and temporal performance. In the textual branch, the proposed prompting technique captures the contextual backdrop of abnormal behaviors to enrich supervision with behavioral semantic information. Then, the model learns the visual-text features to enhance the learning process through contrastive learning techniques. In addition, this work also presented a new study to explore zero-shot campus abnormal behavior recognition (CABR). It lays the foundation for unlocking the implementation of highly available and robust CABR for multiple and even new scenarios. The proposed VTCL model demonstrated a Top-1 accuracy of 86.92% and a Top-5 accuracy of 98.14% on the CABR50 dataset, including fifty abnormal behaviors on campus, with competitive computational complexity. Furthermore, the zero-shot performance of the proposed model showed competitive outcomes when evaluated on additional datasets, including CABRZ6 and UCF-101.
computer science, artificial intelligence
What problem does this paper attempt to address?