Abstract:The limited domain generalization capability of contemporary video anomaly detection methods restricts their efficacy to specific datasets. To enhance the generalizability and portability of video anomaly detection models, we propose a domain adaptation network framework with robust generalization performance. The objective of the framework is to enable the video anomaly detection model to generalize from the source domain to the untrained target domain while mitigating the impact of missing labeled data on deep architectures. The framework incorporates a graph-based domain-invariant representation learning module and domain discriminator that enable the model to learn deep features with domain-invariant properties that remain unchanged across different domains by calculating the strength of the relationships among domain nodes. Notably, inspired by domain adversarial learning, the framework utilizes a gradient reversal layer acting on backpropagation that guides the parameters of optimal feature mapping in constructing the loss with opposing directions. To address the domain generalization problem in video anomaly detection, this framework applies graph convolution techniques. The framework leverages a novel adjacency matrix that encourages high coherence within the same domain while optimizing the mapping of low-level deep features from source to target domains to enhance the discriminative performance of the video anomaly detection model in the target domain. Simulation experiments were conducted on Avenue, UCSD-Ped1, UCSD-Ped2, ShanghaiTech, UCF-Crime, and TAD datasets, and labeled data from the source domain were utilized during the training process. Various testing results demonstrate that our framework enables models trained in one or more different scenes (domains) to perform well in unknown scenes (domains) with good cross-domain testing AUC performance. For example, in multidomain training generalization to the Avenue dataset for testing, our domain adversarial learning framework improves detection accuracy by 12.47%. Under severe single-domain generalization scenarios, the AUC performance on the target domain (e.g., UCF-Crime dataset) increase by 4.36%, 8.64%, and 3.68%, respectively.

Video-Audio Domain Generalization Via Confounder Disentanglement.

VideoDG: Generalizing Temporal Relations in Videos to Novel Domains

Confidence Attention and Generalization Enhanced Distillation for Continuous Video Domain Adaptation

Unsupervised Domain Adaptation for Video Object Grounding with Cascaded Debiasing Learning

Learning Explicit and Implicit Latent Common Spaces for Audio-Visual Cross-Modal Retrieval

Dual Domain-Adversarial Learning for Audio-Visual Saliency Prediction

Video Unsupervised Domain Adaptation with Deep Learning: A Comprehensive Survey

Graph-based domain adversarial learning framework for video anomaly detection domain generalization

Multi-Modal Domain Adaptation Across Video Scenes for Temporal Video Grounding

Cross-Domain Learning for Video Anomaly Detection with Limited Supervision

Cross-domain video action recognition via adaptive gradual learning

Multi-view Adversarial Discriminator: Mine the Non-causal Factors for Object Detection in Unseen Domains

Cross-Domain Video Anomaly Detection without Target Domain Adaptation

Simplifying Open-Set Video Domain Adaptation with Contrastive Learning

Instrumental Variable-Driven Domain Generalization with Unobserved Confounders

Audio-Visual Information Fusion Using Cross-Modal Teacher-Student Learning for Voice Activity Detection in Realistic Environments

SAVEn-Vid: Synergistic Audio-Visual Integration for Enhanced Understanding in Long Video Context

Benchmarking Cross-Domain Audio-Visual Deception Detection

From Vision to Audio and Beyond: A Unified Model for Audio-Visual Representation and Generation

Learning A Self-Supervised Domain-Invariant Feature Representation for Generalized Audio Deepfake Detection