Abstract:In the context of escalating safety concerns across various domains, the tasks of Video Anomaly Detection (VAD) and Video Anomaly Recognition (VAR) have emerged as critically important for applications in intelligent surveillance, evidence investigation, violence alerting, etc. These tasks, aimed at identifying and classifying deviations from normal behavior in video data, face significant challenges due to the rarity of anomalies which leads to extremely imbalanced data and the impracticality of extensive frame-level data annotation for supervised learning. This paper introduces a novel hierarchical graph neural network (GNN) based model MissionGNN that addresses these challenges by leveraging a state-of-the-art large language model and a comprehensive knowledge graph for efficient weakly supervised learning in VAR. Our approach circumvents the limitations of previous methods by avoiding heavy gradient computations on large multimodal models and enabling fully frame-level training without fixed video segmentation. Utilizing automated, mission-specific knowledge graph generation, our model provides a practical and efficient solution for real-time video analysis without the constraints of previous segmentation-based or multimodal approaches. Experimental validation on benchmark datasets demonstrates our model's performance in VAD and VAR, highlighting its potential to redefine the landscape of anomaly detection and recognition in video surveillance systems.

What problem does this paper attempt to address?

This paper attempts to solve the key problems in video anomaly detection (VAD) and video anomaly recognition (VAR), specifically including: 1. **Data imbalance problem**: Due to the rarity of abnormal events, the proportion of normal and abnormal behaviors in the training data is severely unbalanced. This makes it difficult for supervised - learning - based methods to be effectively trained. 2. **High cost of frame - level annotation**: Frame - level annotation of video data requires a large amount of time and resources, so it is impractical in practical applications. This limits the application of supervised - learning methods. 3. **Limitations of existing weakly - supervised - learning methods**: - **Multi - instance learning (MIL) depends on fixed video segmentation**, and it is difficult to handle abnormal events of different lengths and performs poorly in real - time analysis. - **Large multimodal models have a heavy computational burden**: These models require a large amount of gradient calculation, resulting in a time - consuming training process and high resource consumption. To solve these problems, the paper proposes a new model named MISSION GNN, and its main contributions are as follows: - **Introduction of hierarchical graph neural network (GNN)**: By using the hierarchical GNN, the model can effectively capture semantic information without the need for gradient calculation of large multimodal models. - **Automatic task - specific knowledge graph generation framework**: Utilize large - scale language models (such as GPT - 4) and knowledge graphs (such as ConceptNet) to automatically generate task - specific knowledge graphs to assist in anomaly recognition. - **Full frame - level training**: Without video segmentation, directly perform training and inference at the frame level, which improves the practicality and efficiency of the model in real - time applications. - **Lightweight short - term time model**: Only focus on short - term relationships, avoid the complexity of long - term dependency modeling, making the model more suitable for real - time scenarios. Through these innovations, MISSION GNN can efficiently perform video anomaly detection and recognition under weakly - supervised conditions, reduce the demand for computational resources, and improve the accuracy and practicality of the model.

MissionGNN: Hierarchical Multimodal GNN-based Weakly Supervised Video Anomaly Recognition with Mission-Specific Knowledge Graph Generation

Continuous GNN-based Anomaly Detection on Edge using Efficient Adaptive Knowledge Graph Learning

AD-Graph: Weakly Supervised Anomaly Detection Graph Neural Network

Multi-scale Spatial-temporal Interaction Network for Video Anomaly Detection

Video Anomaly Detection using GAN

Adaptive Graph Convolutional Networks for Weakly Supervised Anomaly Detection in Videos

CVAD-GAN: Constrained video anomaly detection via generative adversarial network

Video anomaly detection and localization via multivariate gaussian fully convolution adversarial autoencoder

MSN-net: Multi-Scale Normality Network for Video Anomaly Detection

Global Information Guided Video Anomaly Detection

Stochastic video normality network for abnormal event detection in surveillance videos

Anomalies cannot materialize or vanish out of thin air: A hierarchical multiple instance learning with position-scale awareness for video anomaly detection

Multi-Channel Generative Framework and Supervised Learning for Anomaly Detection in Surveillance Videos

Open-Vocabulary Video Anomaly Detection

Rethinking Prediction-Based Video Anomaly Detection from Local-Global Normality Perspective

FL-MGVN: Federated learning for anomaly detection using mixed gaussian variational self-encoding network

Generalized Video Anomaly Event Detection: Systematic Taxonomy and Comparison of Deep Models.

Video anomaly detection and localization via Gaussian Mixture Fully Convolutional Variational Autoencoder

Video Anomaly Detection Based on Global–Local Convolutional Autoencoder

MVAD HAN: A Multi-View Based Anomaly Detection Method for Heterogeneous Attributed Networks

Towards Open Set Video Anomaly Detection