Memory-Augmented Spatial-Temporal Consistency Network for Video Anomaly Detection.

Zhangxun Li,Mengyang Zhao,Xinhua Zeng,Tian Wang,Chengxin Pang
DOI: https://doi.org/10.1007/978-981-99-8537-1_8
2024-01-01
Abstract:Video anomaly detection (VAD) in intelligent surveillance systems is a crucial yet highly challenging task. Since appearance and motion information is vital for identifying anomalies, existing unsupervised VAD methods usually learn normality from them. However, these approaches tend to consider appearance and motion separately or simply integrate them while ignoring the consistency between them, resulting in sub-optimal performance. To address this problem, we propose a Memory-Augmented Spatial-Temporal Consistency Network, aiming to model the latent consistency between spatial appearance and temporal motion by learning the unified spatiotemporal representation. Additionally, we introduce a spatial-temporal memory fusion module to record spatial and temporal prototypes of regular patterns from the unified spatiotemporal representation, increasing the gap between normal and abnormal events in the feature space. Experimental results on three benchmarks demonstrate the effectiveness of the spatial-temporal consistency for VAD tasks. Our method performs comparably to the state-of-the-art methods with AUCs of 97.6%, 89.3%, and 73.3% on the UCSD Ped2, CUHK Avenue, and ShanghaiTech datasets, respectively.
What problem does this paper attempt to address?