Abstract:Due to the fuzziness of anomaly definition and the complexity of scenes in real video data, video anomaly detection is still a challenging task. In this work, we explored a novel lightweight dual branch convolution neural architecture that can separate appearance-motion representations to capture spatial and temporal information, respectively, since abnormal events are usually different from normal cases in appearance or motion behavior. Considering the channel redundancy problem in the traditional neural network, and the feature information processed by different branches is different, the corresponding channel attenuation is carried out, which greatly improves the speed of anomaly detection while maintaining the performance of the model. In order to improve the utilization of key features, we exploited a Channel Squeeze and Excitation module and insert it into the encoder part of the network to focus on the channel correlation and adaptively recalibrate the characteristic response of the channel. The importance of each feature channel is automatically acquired through learning, and then according to the importance, the useful channels are promoted and the channels that are not useful for the current task are suppressed. In addition, in order to increase the reconstruction error of the motion encoder and consider the diversity of the normal patterns, we propose to use a memory module to augment the motion U-Net, where the items in the memory record the prototype mode of the normal data. The experiments on three benchmark datasets, UCSD Ped2, CHUK Avenue, and ShanghaiTech, demonstrate that our method achieves AUC scores of 96.3%, 87.4%, and 73.5%, respectively. The experimental speed reaches 55fps, showing a competitive performance relative to the current state of research.

Appearance-Motion united Auto-Encoder Framework for Video Anomaly Detection

Learning Appearance-motion Normality for Video Anomaly Detection.

Learning Appearance-Motion Synergy Via Memory-Guided Event Prediction for Video Anomaly Detection

Memory-enhanced appearance-motion consistency framework for video anomaly detection

Video Anomaly Detection Based on Global–Local Convolutional Autoencoder

Appearance-Motion Memory Consistency Network for Video Anomaly Detection

Video anomaly detection based on a multi-layer reconstruction autoencoder with a variance attention strategy

Learning Attention Augmented Spatial-temporal Normality for Video Anomaly Detection

Spatiotemporal Masked Autoencoder with Multi-Memory and Skip Connections for Video Anomaly Detection

Video Anomaly Detection Based on Spatio-Temporal Relationships among Objects

Video Anomaly Detection Based on Attention Mechanism

Attention-based residual autoencoder for video anomaly detection

AMP-Net: Appearance-Motion Prototype Network Assisted Automatic Video Anomaly Detection System

Synthetic Pseudo Anomalies for Unsupervised Video Anomaly Detection: A Simple yet Efficient Framework based on Masked Autoencoder

Channel based approach via faster dual prediction network for video anomaly detection

Appearance Blur-driven AutoEncoder and Motion-guided Memory Module for Video Anomaly Detection

Memory Enhanced Spatial-Temporal Graph Convolutional Autoencoder for Human-Related Video Anomaly Detection.

Collaborative Normality Learning Framework for Weakly Supervised Video Anomaly Detection

Memory-Augmented Spatial-Temporal Consistency Network for Video Anomaly Detection.

Pedestrian Spatio-Temporal Information Fusion For Video Anomaly Detection

Self-Distilled Masked Auto-Encoders are Efficient Video Anomaly Detectors