Abstract:Video anomaly detection aims to find the events in a video that do not conform to the expected behavior. The prevalent methods mainly detect anomalies by snippet reconstruction or future frame prediction error. However, the error is highly dependent on the local context of the current snippet and lacks the understanding of normality. To address this issue, we propose to detect anomalous events not only by the local context, but also according to the consistency between the testing event and the knowledge about normality from the training data. Concretely, we propose a novel two-stream framework based on context recovery and knowledge retrieval, where the two streams can complement each other. For the context recovery stream, we propose a spatiotemporal U-Net which can fully utilize the motion information to predict the future frame. Furthermore, we propose a maximum local error mechanism to alleviate the problem of large recovery errors caused by complex foreground objects. For the knowledge retrieval stream, we propose an improved learnable locality-sensitive hashing, which optimizes hash functions via a Siamese network and a mutual difference loss. The knowledge about normality is encoded and stored in hash tables, and the distance between the testing event and the knowledge representation is used to reveal the probability of anomaly. Finally, we fuse the anomaly scores from the two streams to detect anomalies. Extensive experiments demonstrate the effectiveness and complementarity of the two streams, whereby the proposed two-stream framework achieves state-of-the-art performance on ShanghaiTech, Avenue and Corridor datasets among the methods without object detection. Even if compared with the methods using object detection, our method reaches competitive or better performance on the ShanghaiTech, Avenue, and Ped2 datasets.

Learning Appearance-motion Normality for Video Anomaly Detection.

Appearance-Motion united Auto-Encoder Framework for Video Anomaly Detection

Learning Attention Augmented Spatial-temporal Normality for Video Anomaly Detection

Learning Appearance-Motion Synergy Via Memory-Guided Event Prediction for Video Anomaly Detection

Memory-Augmented Spatial-Temporal Consistency Network for Video Anomaly Detection.

Video Anomaly Detection Based on Spatio-Temporal Relationships among Objects

Video Anomaly Detection By The Duality Of Normality-Granted Optical Flow

Spatiotemporal consistency-enhanced network for video anomaly detection

Stochastic video normality network for abnormal event detection in surveillance videos

Collaborative Normality Learning Framework for Weakly Supervised Video Anomaly Detection

Memory Enhanced Spatial-Temporal Graph Convolutional Autoencoder for Human-Related Video Anomaly Detection.

Learning Task-Specific Representation for Video Anomaly Detection with Spatial-Temporal Attention

Decoupled appearance and motion learning for efficient anomaly detection in surveillance video

A Two-Branch Network for Video Anomaly Detection with Spatio-Temporal Feature Learning

Appearance-Motion Memory Consistency Network for Video Anomaly Detection

Tam-Net: Temporal Enhanced Appearance-To-Motion Generative Network For Video Anomaly Detection

Appearance Blur-driven AutoEncoder and Motion-guided Memory Module for Video Anomaly Detection

Channel based approach via faster dual prediction network for video anomaly detection

Attention-based residual autoencoder for video anomaly detection

Context Recovery and Knowledge Retrieval: A Novel Two-Stream Framework for Video Anomaly Detection

Normality learning reinforcement for anomaly detection in surveillance videos