Abstract:Weakly supervised video anomaly detection (WSVAD) constitutes a highly research-oriented and challenging project within the domains of image and video processing. In prior studies of WSVAD, it has typically been formulated as a multiple-instance learning (MIL) problem. However, quite a few of these methods tend to primarily concentrate on time periods when anomalies occur discernibly. To recognize anomalous events, they rely solely on detecting significant changes in appearance or motion, ignoring the temporal completeness or continuity that anomalous events possess by nature. In addition, they also disregard the subtle correlations at the transitional boundaries between normal and abnormal states. Therefore, we propose a weakly supervised learning approach based on Transformer with margin learning for video anomaly detection. Specifically, our network effectively captures temporal changes around the occurrence of anomalies by utilizing the benefits of Transformer blocks, which are adept at capturing long-range dependencies in anomalous events. Secondly, to tackle challenging cases, i.e., normal events with high similarity to anomalous events, we employed a hard score memory. The purpose of this memory is to store the anomaly scores of hard samples, enabling iterative optimization training on those hard instances. Additionally, to bolster the discriminative capability of the model at the score level, we utilize pseudo-labels for anomalous events to provide supplementary support in detection. Experiments were conducted on two large-scale datasets, namely the ShanghaiTech dataset and the UCF-Crime dataset, and they achieved highly favorable results. The results of the experiments demonstrate that the proposed method is sensitive to anomalous events while performing competitively against state-of-the-art methods.

Enhancing video anomaly detection with learnable memory network: A new approach to memory-based auto-encoders

Memory Enhanced Spatial-Temporal Graph Convolutional Autoencoder for Human-Related Video Anomaly Detection.

A novel spatio-temporal memory network for video anomaly detection

Anomaly detection in surveillance videos using Transformer with margin learning

Video Anomaly Detection Based on Global–Local Convolutional Autoencoder

Memory transformation networks for weakly supervised visual classification

Research on Video Anomaly Detection Based on Cascaded Memory-augmented Autoencoder

Enhancing Video Anomaly Detection Using a Transformer Spatiotemporal Attention Unsupervised Framework for Large Datasets

Memory-augmented Adversarial Autoencoders for Multivariate Time-series Anomaly Detection with Deep Reconstruction and Prediction

TransAnomaly: Video Anomaly Detection Using Video Vision Transformer

Video anomaly detection based on a multi-layer reconstruction autoencoder with a variance attention strategy

Video anomaly detection with memory-guided multilevel embedding

Spatiotemporal Masked Autoencoder with Multi-Memory and Skip Connections for Video Anomaly Detection

Memory-Augmented Spatial-Temporal Consistency Network for Video Anomaly Detection.

Video Anomaly Detection Based on Attention Mechanism

Normal Learning in Videos with Attention Prototype Network

Memorizing Normality to Detect Anomaly: Memory-Augmented Deep Autoencoder for Unsupervised Anomaly Detection

Channel based approach via faster dual prediction network for video anomaly detection

Synthetic Pseudo Anomalies for Unsupervised Video Anomaly Detection: A Simple yet Efficient Framework based on Masked Autoencoder

A Transformer Architecture based mutual attention for Image Anomaly Detection

Attention-based residual autoencoder for video anomaly detection