TransAnomaly: Video Anomaly Detection Using Video Vision Transformer

Hongchun Yuan,Zhenyu Cai,Hui Zhou,Yue Wang,Xiangzhi Chen
DOI: https://doi.org/10.1109/access.2021.3109102
IF: 3.9
2021-01-01
IEEE Access
Abstract:Video anomaly detection is challenging because abnormal events are unbounded, rare, equivocal, irregular in real scenes. In recent years, transformers have demonstrated powerful modelling abilities for sequence data. Thus, we attempt to apply transformers to video anomaly detection. In this paper, we propose a prediction-based video anomaly detection approach named TransAnomaly. Our model combines the U-Net and the Video Vision Transformer (ViViT) to capture richer temporal information and more global contexts. To make full use of the ViViT for the prediction, we modified the ViViT to make it capable of video prediction. Experiments on benchmark datasets show that the addition of the transformer module improves the anomaly detection performance. In addition, we calculate regularity scores with sliding windows and evaluate the impact of different window sizes and strides. With proper settings, our model outperforms other state-of-the-art prediction-based video anomaly detection approaches. Furthermore, our model can perform anomaly localization by tracking the location of patches with lower regularity scores.
computer science, information systems,telecommunications,engineering, electrical & electronic
What problem does this paper attempt to address?