Detecting Anomalous Events from Unlabeled Videos Via Temporal Masked Auto-Encoding

Jingtao Hu,Guang Yu,Siqi Wang,En Zhu,Zhiping Cai,Xinzhong Zhu
DOI: https://doi.org/10.1109/icme52920.2022.9859873
2022-01-01
Abstract:Unsupervised video anomaly detection (UVAD) intends to discern anomalous events from fully unlabeled videos. However, existing UVAD methods suffer from poor performance. Inspired by recent masked autoencoder (MAE) [1], we propose Temporal Masked Auto-Encoding (TMAE) as an effective end-to-end UVAD method. Specifically, we first denote video events by spatial-temporal cubes (STCs), which are built by temporally consecutive foreground patches from unlabeled videos. Then, half of patches in an STC are masked along the temporal dimension, while a vision transformer (ViT) is trained to exploit unmasked patches to predict masked patches. The rare and unusual nature of anomaly will result in a poorer prediction for anomalous events, which enables us to discriminate anomalies from unlabeled videos and compute the anomaly scores. Furthermore, to utilize motion clues in videos, we also propose to apply TMAE on optical flow, which can further boost performance. Experiments show that TMAE significantly outperforms existing UVAD methods by a notable margin (3.9%–6.6% AUC).
What problem does this paper attempt to address?