MissionGNN: Hierarchical Multimodal GNN-based Weakly Supervised Video Anomaly Recognition with Mission-Specific Knowledge Graph Generation

Sanggeon Yun,Ryozo Masukawa,Minhyoung Na,Mohsen Imani
2024-10-31
Abstract:In the context of escalating safety concerns across various domains, the tasks of Video Anomaly Detection (VAD) and Video Anomaly Recognition (VAR) have emerged as critically important for applications in intelligent surveillance, evidence investigation, violence alerting, etc. These tasks, aimed at identifying and classifying deviations from normal behavior in video data, face significant challenges due to the rarity of anomalies which leads to extremely imbalanced data and the impracticality of extensive frame-level data annotation for supervised learning. This paper introduces a novel hierarchical graph neural network (GNN) based model MissionGNN that addresses these challenges by leveraging a state-of-the-art large language model and a comprehensive knowledge graph for efficient weakly supervised learning in VAR. Our approach circumvents the limitations of previous methods by avoiding heavy gradient computations on large multimodal models and enabling fully frame-level training without fixed video segmentation. Utilizing automated, mission-specific knowledge graph generation, our model provides a practical and efficient solution for real-time video analysis without the constraints of previous segmentation-based or multimodal approaches. Experimental validation on benchmark datasets demonstrates our model's performance in VAD and VAR, highlighting its potential to redefine the landscape of anomaly detection and recognition in video surveillance systems.
Machine Learning
What problem does this paper attempt to address?
This paper attempts to solve the key problems in video anomaly detection (VAD) and video anomaly recognition (VAR), specifically including: 1. **Data imbalance problem**: Due to the rarity of abnormal events, the proportion of normal and abnormal behaviors in the training data is severely unbalanced. This makes it difficult for supervised - learning - based methods to be effectively trained. 2. **High cost of frame - level annotation**: Frame - level annotation of video data requires a large amount of time and resources, so it is impractical in practical applications. This limits the application of supervised - learning methods. 3. **Limitations of existing weakly - supervised - learning methods**: - **Multi - instance learning (MIL) depends on fixed video segmentation**, and it is difficult to handle abnormal events of different lengths and performs poorly in real - time analysis. - **Large multimodal models have a heavy computational burden**: These models require a large amount of gradient calculation, resulting in a time - consuming training process and high resource consumption. To solve these problems, the paper proposes a new model named MISSION GNN, and its main contributions are as follows: - **Introduction of hierarchical graph neural network (GNN)**: By using the hierarchical GNN, the model can effectively capture semantic information without the need for gradient calculation of large multimodal models. - **Automatic task - specific knowledge graph generation framework**: Utilize large - scale language models (such as GPT - 4) and knowledge graphs (such as ConceptNet) to automatically generate task - specific knowledge graphs to assist in anomaly recognition. - **Full frame - level training**: Without video segmentation, directly perform training and inference at the frame level, which improves the practicality and efficiency of the model in real - time applications. - **Lightweight short - term time model**: Only focus on short - term relationships, avoid the complexity of long - term dependency modeling, making the model more suitable for real - time scenarios. Through these innovations, MISSION GNN can efficiently perform video anomaly detection and recognition under weakly - supervised conditions, reduce the demand for computational resources, and improve the accuracy and practicality of the model.