Abstract:Weakly supervised video anomaly detection (WSVAD) constitutes a highly research-oriented and challenging project within the domains of image and video processing. In prior studies of WSVAD, it has typically been formulated as a multiple-instance learning (MIL) problem. However, quite a few of these methods tend to primarily concentrate on time periods when anomalies occur discernibly. To recognize anomalous events, they rely solely on detecting significant changes in appearance or motion, ignoring the temporal completeness or continuity that anomalous events possess by nature. In addition, they also disregard the subtle correlations at the transitional boundaries between normal and abnormal states. Therefore, we propose a weakly supervised learning approach based on Transformer with margin learning for video anomaly detection. Specifically, our network effectively captures temporal changes around the occurrence of anomalies by utilizing the benefits of Transformer blocks, which are adept at capturing long-range dependencies in anomalous events. Secondly, to tackle challenging cases, i.e., normal events with high similarity to anomalous events, we employed a hard score memory. The purpose of this memory is to store the anomaly scores of hard samples, enabling iterative optimization training on those hard instances. Additionally, to bolster the discriminative capability of the model at the score level, we utilize pseudo-labels for anomalous events to provide supplementary support in detection. Experiments were conducted on two large-scale datasets, namely the ShanghaiTech dataset and the UCF-Crime dataset, and they achieved highly favorable results. The results of the experiments demonstrate that the proposed method is sensitive to anomalous events while performing competitively against state-of-the-art methods.

Transformer Based Sptial-Temporal Extraction Model for Video Anomaly Detection

Video Anomaly Detection Based on Spatio-Temporal Relationships among Objects

Spatio-Temporal-based Context Fusion for Video Anomaly Detection

TransAnomaly: Video Anomaly Detection Using Video Vision Transformer

Spatiotemporal consistency-enhanced network for video anomaly detection

Video anomaly detection based on attention and efficient spatio-temporal feature extraction

Learning Attention Augmented Spatial-temporal Normality for Video Anomaly Detection

Anomaly detection in surveillance videos using transformer based attention model

Anomaly Detection Via Local Coordinate Factorization And Spatio-Temporal Pyramid

Pedestrian Spatio-Temporal Information Fusion For Video Anomaly Detection

Anomaly detection in surveillance videos using Transformer with margin learning

Multi-Scale Temporal Relations and Segmented Channel Attention for Video Anomaly Detection

Enhancing Video Anomaly Detection Using a Transformer Spatiotemporal Attention Unsupervised Framework for Large Datasets

Configurable Spatial-Temporal Hierarchical Analysis for Flexible Video Anomaly Detection

Anomaly Detection with ELM-Based Visual Attribute and Spatio-temporal Pyramid

Real-world Video Anomaly Detection by Extracting Salient Features in Videos

Learning Task-Specific Representation for Video Anomaly Detection with Spatial-Temporal Attention

Anomaly Detection Via Midlevel Visual Attributes

Multi-Scale Video Anomaly Detection by Multi-Grained Spatio-Temporal Representation Learning

Learning to Detect Anomalies in Surveillance Video.

A Two-Branch Network for Video Anomaly Detection with Spatio-Temporal Feature Learning