Abstract:Weakly supervised video anomaly detection aims to locate abnormal activities in untrimmed videos without the need for frame-level supervision. Prior work has utilized graph convolution networks or self-attention mechanisms alongside multiple instance learning (MIL)-based classification loss to model temporal relations and learn discriminative features. However, these approaches are limited in two aspects: 1) Multi-branch parallel architectures, while capturing multi-scale temporal dependencies, inevitably lead to increased parameter and computational costs. 2) The binarized MIL constraint only ensures the interclass separability while neglecting the fine-grained discriminability within anomalous classes. To this end, we introduce a novel WS-VAD framework that focuses on efficient temporal modeling and anomaly innerclass discriminability. We first construct a Temporal Context Aggregation (TCA) module that simultaneously captures local-global dependencies by reusing an attention matrix along with adaptive context fusion. In addition, we propose a Prompt-Enhanced Learning (PEL) module that incorporates semantic priors using knowledge-based prompts to boost the discrimination of visual features while ensuring separability across anomaly subclasses. The proposed components have been validated through extensive experiments, which demonstrate superior performance on three challenging datasets, UCF-Crime, XD-Violence and ShanghaiTech, with fewer parameters and reduced computational effort. Notably, our method can significantly improve the detection accuracy for certain anomaly subclasses and reduced the false alarm rate. Our code is available at: https://github.com/yujiangpu20/PEL4VAD.

Collaborative Normality Learning Framework for Weakly Supervised Video Anomaly Detection

Normality learning reinforcement for anomaly detection in surveillance videos

Stochastic video normality network for abnormal event detection in surveillance videos

Learning Appearance-motion Normality for Video Anomaly Detection.

Rethinking Prediction-Based Video Anomaly Detection from Local-Global Normality Perspective

Weakly Supervised Video Anomaly Detection via Center-guided Discriminative Learning

Learning Prompt-Enhanced Context Features for Weakly-Supervised Video Anomaly Detection

Appearance-Motion united Auto-Encoder Framework for Video Anomaly Detection

Learning Attention Augmented Spatial-temporal Normality for Video Anomaly Detection

Generate anomalies from normal: a partial pseudo-anomaly augmented approach for video anomaly detection

Dual Memory Units with Uncertainty Regulation for Weakly Supervised Video Anomaly Detection

Towards Open Set Video Anomaly Detection

Diffusion-based normality pre-training for weakly supervised video anomaly detection

Learn Suspected Anomalies from Event Prompts for Video Anomaly Detection

Video anomaly detection based on a multi-layer reconstruction autoencoder with a variance attention strategy

Memory-Augmented Spatial-Temporal Consistency Network for Video Anomaly Detection.

Interleaving One-Class and Weakly-Supervised Models with Adaptive Thresholding for Unsupervised Video Anomaly Detection

Generalized Video Anomaly Event Detection: Systematic Taxonomy and Comparison of Deep Models.

Appearance Blur-driven AutoEncoder and Motion-guided Memory Module for Video Anomaly Detection

Video Anomaly Detection Based on Global–Local Convolutional Autoencoder

FE-VAD: High-Low Frequency Enhanced Weakly Supervised Video Anomaly Detection