Learning Attention Augmented Spatial-temporal Normality for Video Anomaly Detection

Yang Liu,Shuang Li,Jing Liu,Hao Yang,Mengyang Zhao,Xinhua Zeng,Wei Ni,Liang Song
DOI: https://doi.org/10.1109/ishc54333.2021.00034
2021-01-01
Abstract:Video anomaly detection is an essential and challenging task in the computer vision community, which aims to automatically detect and localize abnormal events in videos. In this paper, we propose an attention augmented spatial-temporal normality learning framework to explore the unique spatial and temporal patterns of normal events. Specifically, we first slice the videos into local spatial-temporal cubes along the spatial and temporal dimensions to facilitate independent learning of the prototypical spatial and temporal patterns of normal videos. In the training phase, we use parallel deep convolutional neural networks to learn the spatial features of each cube and introduce an attention module to guide the model to focus on the important local cubes. Then, to exploit the complementary information of adjacent video fragments in the temporal dimension, we use a convolutional long-short memory network to model temporal patterns. In the testing phase, we calculate the prediction errors of the salient areas and compute the anomaly score by measuring the difference between the testing samples and the learned spatial-temporal normality. Experimental results on standard benchmarks show that the proposed method achieves a comparable performance to the state-of-the-art methods with frame-level AUCs of 96.6%, 85.2%, and 68.8% on UCSD Ped2, CUHK Avenue, and ShanghaiTech, respectively.
What problem does this paper attempt to address?