Bounding Boxes and Probabilistic Graphical Models: Video Anomaly Detection Simplified

Mia Siemon,Thomas B. Moeslund,Barry Norton,Kamal Nasrollahi
2024-07-08
Abstract:In this study, we formulate the task of Video Anomaly Detection as a probabilistic analysis of object bounding boxes. We hypothesize that the representation of objects via their bounding boxes only, can be sufficient to successfully identify anomalous events in a scene. The implied value of this approach is increased object anonymization, faster model training and fewer computational resources. This can particularly benefit applications within video surveillance running on edge devices such as cameras. We design our model based on human reasoning which lends itself to explaining model output in human-understandable terms. Meanwhile, the slowest model trains within less than 7 seconds on a 11th Generation Intel Core i9 Processor. While our approach constitutes a drastic reduction of problem feature space in comparison with prior art, we show that this does not result in a reduction in performance: the results we report are highly competitive on the benchmark datasets CUHK Avenue and ShanghaiTech, and significantly exceed on the latest State-of-the-Art results on StreetScene, which has so far proven to be the most challenging VAD dataset.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problems that this paper attempts to solve are several key challenges in video anomaly detection (VAD), specifically including: 1. **Simplifying problem representation**: The author assumes that the abnormal events in the scene can be fully identified only by the representation of object bounding boxes. The advantages of this method are: - Improving the degree of object anonymization - Accelerating the model training speed - Reducing the demand for computing resources 2. **Improving the applicability on edge devices**: Especially for video surveillance applications running on edge devices (such as cameras), this simplified method can significantly reduce the demand for high - performance hardware. 3. **Maintaining or improving performance**: Although the feature space is greatly reduced, the author hopes to prove that this method will not degrade the detection performance and can achieve results comparable to or even better than existing methods on the benchmark data set. ### Research background and motivation Traditional VAD methods usually rely on complex deep - learning models, which require a large amount of computing resources for training and inference. However, in the actual application scenarios of video surveillance, especially on edge devices, these high - resource - demanding models are difficult to deploy. Therefore, researchers have been looking for more efficient and lightweight solutions. ### Main contributions The main contributions of this paper can be summarized as follows: 1. **A new perspective of problem domain representation**: A new method for video anomaly detection using only object bounding boxes is proposed, reducing the need for complex feature extraction. 2. **Full - object - centered VAD based on discrete Bayesian networks**: Visual anomalies in the video stream are detected by learning high - dimensional bounding box attributes. 3. **Interpretability**: A human - understandable way to explain the model output is provided, making the model results more transparent and credible. 4. **A new SOTA baseline**: Significantly surpasses the previous latest achievements on the StreetScene data set, with an average increase of about 4% in RBDC/TBDC scores; also achieves competitive results on the CUHK Avenue and ShanghaiTech data sets. ### Method overview This method consists of two modules: - **Image pre - processing module**: MOT instance BoT - SORT is used for target tracking to generate bounding boxes. - **Video anomaly detection module**: Based on the discrete Bayesian network (BN), the spatio - temporal attributes of the bounding boxes are analyzed, and the probability score of each bounding box is output, indicating the possibility of the object appearing in the scene. In this way, the author successfully simplifies the VAD task into a probabilistic graphical model problem and demonstrates its effectiveness on multiple benchmark data sets. ### Conclusion This paper proposes a novel and efficient video anomaly detection method, which not only simplifies the problem representation, but also improves the interpretability of the model and its applicability on edge devices, while maintaining excellent detection performance.