Abstract:Video anomaly detection is challenging due to the diversity of abnormal events, which turns unsupervised learning the promising solution in recent endeavors. In a framework as such, the existing works hold with such an assumption that anomalies cannot be reconstructed or predicated from historical data as accurately as normal ones, so the reconstruction or prediction error can act as an indicator of anomalies. In this study, we propose to discriminate anomalies from normal ones by fusing both appearance and motion in a frame prediction framework, where the uniqueness lies in that we embed optical flows into the frame prediction framework as the clue to direct the transformation from the input to the predicted frame, making appearance-motion fusion quite natural without any extra effort to align them. Then, the error of predicting the next frame based on the concatenation of the appearance and the associated motion of the present frame can indicate the anomaly score. Notably, we employ one frame-based optical flow computation instead of the traditional ones over two continuous frames. The goal is to make the optical flows subject to the whole training data such that anomalies deviating remarkably from the training data will result in highly distorted optical flows and relatively high prediction error correspondingly, which is not promised by the traditional optical flows rooting in the differential of two continuous frames. In summary, we extend the appearance-motion correspondence learning to motion-guided prediction tying the appearances of two consecutive frames. We also introduce a margin loss to enhance the learning of frame prediction. Experiments on widely accepted benchmarks demonstrate the state-of-the-art performance of our approach.

Learning Appearance-Motion Synergy Via Memory-Guided Event Prediction for Video Anomaly Detection

Learning Appearance-motion Normality for Video Anomaly Detection.

Appearance-Motion united Auto-Encoder Framework for Video Anomaly Detection

Appearance-Motion Memory Consistency Network for Video Anomaly Detection

Memory-enhanced appearance-motion consistency framework for video anomaly detection

Decoupled appearance and motion learning for efficient anomaly detection in surveillance video

Memory Enhanced Spatial-Temporal Graph Convolutional Autoencoder for Human-Related Video Anomaly Detection.

Memory-Augmented Spatial-Temporal Consistency Network for Video Anomaly Detection.

Learning Attention Augmented Spatial-temporal Normality for Video Anomaly Detection

Video Anomaly Detection Via Successive Image Frame Prediction Leveraging Optical Flows

Pose-Motion Video Anomaly Detection via Memory-Augmented Reconstruction and Conditional Variational Prediction

Learn Suspected Anomalies from Event Prompts for Video Anomaly Detection

Appearance Blur-driven AutoEncoder and Motion-guided Memory Module for Video Anomaly Detection

Video anomaly detection based on a multi-layer reconstruction autoencoder with a variance attention strategy

Spatiotemporal consistency-enhanced network for video anomaly detection

Comprehensive Regularization in a Bi-directional Predictive Network for Video Anomaly Detection

Integrated Multiscale Appearance Features and Motion Information Prediction Network for Anomaly Detection

Dissimilate-and-Assimilate Strategy for Video Anomaly Detection and Localization

Event-driven Weakly Supervised Video Anomaly Detection

Tam-Net: Temporal Enhanced Appearance-To-Motion Generative Network For Video Anomaly Detection

Spatiotemporal Masked Autoencoder with Multi-Memory and Skip Connections for Video Anomaly Detection