Temporal-Aware Self-Supervised Learning for Unsupervised Video Anomaly Detection

Guoqian Shang,Chao Huang,Jingyong Su,Yong Xu
DOI: https://doi.org/10.1109/acait53529.2021.9731141
2021-01-01
Abstract:Video anomaly detection (VAD) is commonly formulated as the discrimination of events that do not confirm to the regular patterns in videos. Recently, deep neural network-based VAD approaches have gained remarkable progresses. Existing unsupervised approaches usually achieve VAD by frame reconstruction or prediction, and then identifying anomalies according to the reconstruction or prediction errors. However, these approaches suffer from two limitations: (1) They cannot obtain the semantic features of normal training samples. (2) It is suboptimal because of the non-alignment between the proxy and actual tasks. To address the above issues, we present a novel temporal-aware self-supervised learning framework to obtain the high-level semantic features and to perform VAD by solving multiple pretext tasks. In particular, we utilize temporal transformations to form multiple pretext tasks (transformations prediction) for VAD. A 3D encoder is trained to obtain semantic features by jointly solving these pretext tasks. Then, multi task heads utilize these features to solve different pretext tasks. In the inference phase, multiple task losses are used for calculating the final anomaly score. Extensive experiments are conducted on two benchmarks, which shows that the proposed method outperforms state-of-the-arts.
What problem does this paper attempt to address?