Abstract:This three‐level prediction, namely Progressive Prediction Video Anomaly Detection (P3VAD), enlarges the prediction error on irregular motion patterns. This is the first effort to progressively combine three‐level predictions from coarse to fine‐grained for VAD. We demonstrate the effectiveness of our framework by conducting an extensive experimental evaluation on the four publicly large‐scale benchmark datasets in both micro‐AUC and macro‐AUC metrics. Video Anomaly Detection (VAD) has been an active research field for several decades. However, most existing approaches merely extract a single type of feature from videos and define a single paradigm to indicate the extent of abnormalities. A coarse‐to‐fine three‐level prediction is built by integrating different levels of spatio‐temporal representations, better highlighting the difference between normal and abnormal behaviors. First, an object‐level trajectory prediction is proposed to model human historical position using a graph transformer network. Subsequently, skeleton‐level prediction is achieved by incorporating the positional information from the trajectory prediction. More importantly, based on the predicted skeleton, a skeleton‐guided pixel‐level region prediction is performed. A novel Skeleton Conditioned Generative Adversarial Network (SCGAN) is designed to explore the correlation between skeleton‐level and pixel‐level motion prediction. Benefiting from SCGAN, the prediction of human regions is contributed by both coarse‐grained and fine‐grained motion features. This three‐level prediction, namely Progressive Prediction Video Anomaly Detection (P3VAD), enlarges the prediction error on irregular motion patterns. Besides, a pixel‐level analysis method is proposed to achieve Background‐bias Elimination (BE) and denoise the predicted region. Experimental results validate the effectiveness of P3VAD on the four benchmark datasets (ShanghaiTech, CUHK Avenue, IITB‐Corridor, and ADOC).

Effect of substitutional impurities (Al,Co,Fe,Ga) on the orthorhombic phase of YBa2Cu3O7- delta.

A New Comprehensive Benchmark for Semi-supervised Video Anomaly Detection and Anticipation

Memory-Augmented Spatial-Temporal Consistency Network for Video Anomaly Detection.

Spatiotemporal consistency-enhanced network for video anomaly detection

Patch Spatio-Temporal Relation Prediction for Video Anomaly Detection

Learning Prompt-Enhanced Context Features for Weakly-Supervised Video Anomaly Detection

Configurable Spatial-Temporal Hierarchical Analysis for Flexible Video Anomaly Detection

Multi-scale Spatial-temporal Interaction Network for Video Anomaly Detection

Learn Suspected Anomalies from Event Prompts for Video Anomaly Detection

Video Anomaly Detection via Spatio-Temporal Pseudo-Anomaly Generation : A Unified Approach

Progressive prediction: Video anomaly detection via multi‐grained prediction

Video Anomaly Detection Based on Spatio-Temporal Relationships among Objects

Spatiotemporal Masked Autoencoder with Multi-Memory and Skip Connections for Video Anomaly Detection

Rethinking Prediction-Based Video Anomaly Detection from Local-Global Normality Perspective

Normality learning reinforcement for anomaly detection in surveillance videos

Bidirectional skip-frame prediction for video anomaly detection with intra-domain disparity-driven attention

Anomalies cannot materialize or vanish out of thin air: A hierarchical multiple instance learning with position-scale awareness for video anomaly detection

Denoising Diffusion-Augmented Hybrid Video Anomaly Detection Via Reconstructing Noised Frames

A Novel Unsupervised Video Anomaly Detection Framework Based on Optical Flow Reconstruction and Erased Frame Prediction

Hierarchical Semantic Contrast for Scene-aware Video Anomaly Detection

Attention-based residual autoencoder for video anomaly detection