Abstract:Video anomaly detection is an essential yet challenging open-set task in computer vision, often addressed by leveraging reconstruction as a proxy task. However, existing reconstruction-based methods encounter challenges in two main aspects: (1) limited model robustness for open-set scenarios, (2) and an overemphasis on, but restricted capacity for, detailed motion reconstruction. To this end, we propose a novel frequency-guided diffusion model with perturbation training, which enhances the model robustness by perturbation training and emphasizes the principal motion components guided by motion frequencies. Specifically, we first use a trainable generator to produce perturbative samples for perturbation training of the diffusion model. During the perturbation training phase, the model robustness is enhanced and the domain of the reconstructed model is broadened by training against this generator. Subsequently, perturbative samples are introduced for inference, which impacts the reconstruction of normal and abnormal motions differentially, thereby enhancing their separability. Considering that motion details originate from high-frequency information, we propose a masking method based on 2D discrete cosine transform to separate high-frequency information and low-frequency information. Guided by the high-frequency information from observed motion, the diffusion model can focus on generating low-frequency information, and thus reconstructing the motion accurately. Experimental results on five video anomaly detection datasets, including human-related and open-set benchmarks, demonstrate the effectiveness of the proposed method. Our code is available at <a class="link-external link-https" href="https://github.com/Xiaofeng-Tan/FGDMAD-Code" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

This paper attempts to solve two main problems in video anomaly detection (VAD): 1. **Insufficient model robustness**: Existing reconstruction - based methods perform poorly in open - set scenarios mainly because the models lack robustness to unseen normal samples. These methods usually use consistent inputs and outputs to learn normal patterns, which may lead the model to learn shortcuts. When normal motion is perturbed, the model is difficult to reconstruct using the learned shortcuts, resulting in misclassification. 2. **Over - focus on detailed motion reconstruction**: Existing methods do not distinguish between the principal components and detailed information of motion when processing them. From the perspective of signal processing, the principal components and detailed information can be represented as low - frequency and high - frequency information respectively. It is relatively easy to generate approximate motion, but it is very difficult to accurately reconstruct the details of these motions because the diversity of personal habits will lead to changes in high - frequency information. To solve these problems, the authors propose a new **Frequency - Guided Diffusion Model with Perturbation Training**. The main contributions of this method are as follows: - **Enhancing model robustness through perturbation training**: Introduce a trainable Perturbative Example Generator to generate perturbed samples for perturbation training. Through adversarial training, the diffusion model can become robust on perturbed normal motion and enhance the separability between normal and abnormal events. - **Frequency - guided denoising process**: Use two - dimensional discrete cosine transform (2D DCT) to decompose motion into high - frequency and low - frequency information. Guide the diffusion model to focus on generating low - frequency information by observing high - frequency information, so as to more accurately reconstruct motion. Specifically, the workflow of this method includes: 1. **Perturbation generation**: Use the perturbation generator to generate perturbed samples to expand the learning domain of the model. 2. **Perturbation training**: Alternately optimize the perturbation generator and the noise predictor so that the model can handle perturbed samples. 3. **Frequency information extraction and fusion**: In the inference stage, fuse the low - frequency information of the generated motion and the observed high - frequency information to improve the reconstruction quality. Experimental results show that this method significantly outperforms the existing state - of - the - art methods on five public VAD datasets, especially in open - set benchmark tests. ### Formula summary - Perturbation generation formula: \[ \delta=\lambda\cdot\text{sign}(\nabla_\theta L(x,\theta)) \] \[ \hat{x}=x + \delta \] - Noise addition process: \[ \sqrt{\bar{\alpha}_t}x+\sqrt{1-\bar{\alpha}_t}\epsilon=x_t \] - Noise prediction loss: \[ L(x,\theta)=\mathbb{E}_{x,t}[\|\epsilon-\epsilon_\theta(x_t,t,c)\|_2^2] \] - Frequency information extraction: \[ y = \text{DCT}(\bar{x})=D\bar{x} \] \[ \bar{x}=\text{iDCT}(y)=D^T y \] - Frequency information fusion: \[ y_c^t=y_o^t\odot M_h(y_o^t)+y_g^t\odot M_l(y_g^t) \] \[ \bar{x}_c^t=\text{iDCT}(y_c^t) \] Through these improvements, this method achieves better performance in the video anomaly detection task, especially in open - set scenarios.

Frequency-Guided Diffusion Model with Perturbation Training for Skeleton-Based Video Anomaly Detection

A Diffusion-Based Framework for Multi-Class Anomaly Detection

Unsupervised Video Anomaly Detection with Diffusion Models Conditioned on Compact Motion Representations

AnomalyDiffusion: Few-Shot Anomaly Image Generation with Diffusion Model

VADiffusion: Compressed Domain Information Guided Conditional Diffusion for Video Anomaly Detection

Ensemble anomaly score for video anomaly detection using denoise diffusion model and motion filters

Graph-Jigsaw Conditioned Diffusion Model for Skeleton-based Video Anomaly Detection

GLAD: Towards Better Reconstruction with Global and Local Adaptive Diffusion Models for Unsupervised Anomaly Detection

Human Kinematics-inspired Skeleton-based Video Anomaly Detection

Denoising Diffusion-Augmented Hybrid Video Anomaly Detection Via Reconstructing Noised Frames

Video Anomaly Detection Based on Spatio-Temporal Relationships among Objects

DiffTAD: Denoising diffusion probabilistic models for vehicle trajectory anomaly detection

Detecting video anomalies by jointly utilizing appearance and skeleton information

Abnormal Gait Detection in Surveillance Videos with FFT-Based Analysis on Walking Rhythm.

Making Anomalies More Anomalous: Video Anomaly Detection Using a Novel Generator and Destroyer

Video Anomaly Detection via Spatio-Temporal Pseudo-Anomaly Generation : A Unified Approach

Safeguarding Sustainable Cities: Unsupervised Video Anomaly Detection Through Diffusion-based Latent Pattern Learning

A Feature-Trajectory-Smoothed High-Speed Model for Video Anomaly Detection

Learn Suspected Anomalies from Event Prompts for Video Anomaly Detection

DiffusionAD: Norm-guided One-step Denoising Diffusion for Anomaly Detection

Configurable Spatial-Temporal Hierarchical Analysis for Flexible Video Anomaly Detection