Abstract:In this study, we formulate the task of Video Anomaly Detection as a probabilistic analysis of object bounding boxes. We hypothesize that the representation of objects via their bounding boxes only, can be sufficient to successfully identify anomalous events in a scene. The implied value of this approach is increased object anonymization, faster model training and fewer computational resources. This can particularly benefit applications within video surveillance running on edge devices such as cameras. We design our model based on human reasoning which lends itself to explaining model output in human-understandable terms. Meanwhile, the slowest model trains within less than 7 seconds on a 11th Generation Intel Core i9 Processor. While our approach constitutes a drastic reduction of problem feature space in comparison with prior art, we show that this does not result in a reduction in performance: the results we report are highly competitive on the benchmark datasets CUHK Avenue and ShanghaiTech, and significantly exceed on the latest State-of-the-Art results on StreetScene, which has so far proven to be the most challenging VAD dataset.

What problem does this paper attempt to address?

The problems that this paper attempts to solve are several key challenges in video anomaly detection (VAD), specifically including: 1. **Simplifying problem representation**: The author assumes that the abnormal events in the scene can be fully identified only by the representation of object bounding boxes. The advantages of this method are: - Improving the degree of object anonymization - Accelerating the model training speed - Reducing the demand for computing resources 2. **Improving the applicability on edge devices**: Especially for video surveillance applications running on edge devices (such as cameras), this simplified method can significantly reduce the demand for high - performance hardware. 3. **Maintaining or improving performance**: Although the feature space is greatly reduced, the author hopes to prove that this method will not degrade the detection performance and can achieve results comparable to or even better than existing methods on the benchmark data set. ### Research background and motivation Traditional VAD methods usually rely on complex deep - learning models, which require a large amount of computing resources for training and inference. However, in the actual application scenarios of video surveillance, especially on edge devices, these high - resource - demanding models are difficult to deploy. Therefore, researchers have been looking for more efficient and lightweight solutions. ### Main contributions The main contributions of this paper can be summarized as follows: 1. **A new perspective of problem domain representation**: A new method for video anomaly detection using only object bounding boxes is proposed, reducing the need for complex feature extraction. 2. **Full - object - centered VAD based on discrete Bayesian networks**: Visual anomalies in the video stream are detected by learning high - dimensional bounding box attributes. 3. **Interpretability**: A human - understandable way to explain the model output is provided, making the model results more transparent and credible. 4. **A new SOTA baseline**: Significantly surpasses the previous latest achievements on the StreetScene data set, with an average increase of about 4% in RBDC/TBDC scores; also achieves competitive results on the CUHK Avenue and ShanghaiTech data sets. ### Method overview This method consists of two modules: - **Image pre - processing module**: MOT instance BoT - SORT is used for target tracking to generate bounding boxes. - **Video anomaly detection module**: Based on the discrete Bayesian network (BN), the spatio - temporal attributes of the bounding boxes are analyzed, and the probability score of each bounding box is output, indicating the possibility of the object appearing in the scene. In this way, the author successfully simplifies the VAD task into a probabilistic graphical model problem and demonstrates its effectiveness on multiple benchmark data sets. ### Conclusion This paper proposes a novel and efficient video anomaly detection method, which not only simplifies the problem representation, but also improves the interpretability of the model and its applicability on edge devices, while maintaining excellent detection performance.

Bounding Boxes and Probabilistic Graphical Models: Video Anomaly Detection Simplified

Video Anomaly Detection Based on Spatio-Temporal Relationships among Objects

Video anomaly detection based on scene classification

Spatial-based Bayesian Hidden Markov Models with Dirichlet Mixtures for Video Anomaly Detection

Cognition Guided Video Anomaly Detection Framework for Surveillance Services

Robust Unsupervised Video Anomaly Detection by Multipath Frame Prediction

Anomalies cannot materialize or vanish out of thin air: A hierarchical multiple instance learning with position-scale awareness for video anomaly detection

MULDE: Multiscale Log-Density Estimation via Denoising Score Matching for Video Anomaly Detection

Robust Unsupervised Video Anomaly Detection by Multi-Path Frame Prediction

Video Anomaly Detection via Spatio-Temporal Pseudo-Anomaly Generation : A Unified Approach

AD-Graph: Weakly Supervised Anomaly Detection Graph Neural Network

Robust Video Anomaly Detection Framework Via Prior Knowledge and Multi-Path Frame Prediction

Making Anomalies More Anomalous: Video Anomaly Detection Using a Novel Generator and Destroyer

Approaches Toward Physical and General Video Anomaly Detection

Decoupled appearance and motion learning for efficient anomaly detection in surveillance video

A Lightweight Video Anomaly Detection Model with Weak Supervision and Adaptive Instance Selection

Generate anomalies from normal: a partial pseudo-anomaly augmented approach for video anomaly detection

Real-world Video Anomaly Detection by Extracting Salient Features in Videos

Object-Guided and Motion-Refined Attention Network for Video Anomaly Detection

Distilling Aggregated Knowledge for Weakly-Supervised Video Anomaly Detection

Multi-Channel Generative Framework and Supervised Learning for Anomaly Detection in Surveillance Videos