Spatial-Temporal Graph Convolutional Network Boosted Flow-Frame Prediction for Video Anomaly Detection

Kai Cheng,Xinhua Zeng,Yang Liu,Mengyang Zhao,Chengxin Pang,Xing Hu
DOI: https://doi.org/10.1109/icassp49357.2023.10095170
2023-01-01
Abstract:Video Anomaly Detection (VAD) is a critical technology for intelligent surveillance systems and remains a challenging task in the signal processing community. An intuitive idea for VAD is to use a two-stream network to learn appearance and motion normality, respectively. However, existing approaches usually design a network architecture for the appearance stream with effort, then apply a similar architecture to the motion stream, ignoring the unique appearance and motion characteristics. In this paper, we propose STGCN-FFP, an unsupervised Spatial-Temporal Graph Convolutional Networks (STGCN) boosted Flow-Frame Prediction model. Specifically, we first design an STGCN-based memory module to extract and memorize normal patterns for optical flow, which is more suitable for learning motion normality. Then, we use a memory-augmented auto-encoder to model normal appearance patterns. Finally, the latent representation of two streams is fused to predict future frames, boosting the model to learn spatial-temporal normality. To our knowledge, STGCN-FFP is the first work applying STGCN to uniquely model the motion normality. Our method performs comparably to the state-of-the-art methods on three benchmarks.
What problem does this paper attempt to address?