FE-VAD: High-Low Frequency Enhanced Weakly Supervised Video Anomaly Detection

Ruoyan Pi,Jinglin Xu,Yuxin Peng
DOI: https://doi.org/10.1109/icme57554.2024.10688326
2024-01-01
Abstract:Weakly Supervised Video Anomaly Detection (WS-VAD) aims at identifying anomaly events in videos with video-level labels instead of frame-level ones. Previous works usually focused on modeling anomalies in spatio-temporal domains. However, there are various forms of anomaly expressions, thus modeling them only in the spatio-temporal domain is insufficient. To address this issue and comprehensively capture the diverse forms of anomalies, we propose a new approach, High-Low Frequency Enhanced Weakly Supervised Video Anomaly Detection (FE-VAD), which introduces frequency domain information to capture and analyze anomaly features at different frequency levels, facilitating the learning of local and global spatio-temporal dependencies. Our FE-VAD is composed of a temporal strengthening network (TSN) and a high-low frequency enhancement network (HLFN). TSN is utilized to enhance the anomaly features in the traditional spatio-temporal domain, and HLFN decouples and adjusts high and low-frequency information spatially and temporally. In FE-VAD, frequency domain analysis offers a complementary perspective to describe anomalous events that are challenging to detect in traditional spatio-temporal domains. Extensive experiments show that our FE-VAD method achieves state-of-the-art results on three datasets: ShanghaiTech, UCF-Crime, and XD-Violence.
What problem does this paper attempt to address?