Transformer Based Sptial-Temporal Extraction Model for Video Anomaly Detection

Zhiqiang Wang,Xiaojing Gu,Xingsheng Gu
DOI: https://doi.org/10.1109/icrca60878.2024.10649355
2024-01-01
Abstract:The objective of video anomaly detection is to pinpoint abnormal events through the surveillance of videos captured by vision sensors. This technique holds significant value in various fields such as public safety and industrial production process monitoring. Video data is rich in spatial and temporal information. The key to enhancing detection performance lies in effectively utilizing these features in unison. Current methods attempt to separately extract spatial and temporal features, merging them in the latent space. However, these methods overlook the continuity of the video. To address this issue, we propose a model that fuses consecutive and differential spatial-temporal features. This model generates new data containing both consecutive and differential features between different frames of the input video clips. Given that anomalies are unrelated to the background, we perform object detection to mitigate the background's influence. Subsequently, we introduce static filtering to eliminate static objects that contain confusing optical flow. Comprehensive experiments demonstrate that our proposed method delivers outstanding performance on benchmark datasets.
What problem does this paper attempt to address?