Memory Enhanced Spatial-Temporal Graph Convolutional Autoencoder for Human-Related Video Anomaly Detection.

Sibo Luo,Shangshang Wang,Yuan Wu,Cheng Jin
DOI: https://doi.org/10.1007/978-3-031-18913-5_51
2022-01-01
Abstract:Human-related video anomaly detection is a challenging problem due to unclear definitions of anomalies and insufficient training data. Pose-based methods have attracted widespread attention by exploiting highly structured skeleton data that are robust to background noise and illumination changes. However, existing methods use recurrent neural network to extract temporal information while ignoring the spatial dependencies between skeleton joints, which are crucial to reason behaviors. Additionally, commonly-used methods are expected to produce larger reconstruction errors for anomalies than normal samples to achieve anomaly detection. But in practice, due to the strong generalization ability of these models, they fail to obtain significant reconstruction errors for abnormal samples, resulting in missing anomalies. In this paper, we propose a novel framework Memory Enhanced Spatial-Temporal Graph Convolutional Autoencoder(Mem-STGCAE) to address these problems. We use spatial-temporal graph convolution as an encoder to capture discriminative features in spatial and temporal domains. We enhance the autoencoder with a memory module that records normal patterns. The encoded representation is used as a query to retrieve the most relevant patterns. Thus, the decoder reconstructs anomalies using normal patterns, resulting in significant reconstruction errors. Different from traditional autoencoders, two branches of decoder are introduced to reconstruct past and predict future pose sequences respectively. Extensive experiments on two challenging video anomaly datasets, ShanghaiTech and IITB-Corridor, show that our proposed network outperforms other state-of-the-art methods.
What problem does this paper attempt to address?