Memory-guided Representation Matching for Unsupervised Video Anomaly Detection

Yiran Tao,Yaosi Hu,Zhenzhong Chen
DOI: https://doi.org/10.1016/j.jvcir.2024.104185
IF: 2.887
2024-01-01
Journal of Visual Communication and Image Representation
Abstract:Recent works on Video Anomaly Detection (VAD) have made advancements in the unsupervised setting, known as Unsupervised VAD (UVAD), which brings it closer to practical applications. Unlike the classic VAD task that requires a clean training set with only normal events, UVAD aims to identify abnormal frames without any labeled normal/abnormal training data. Many existing UVAD methods employ handcrafted surrogate tasks, such as frame reconstruction, to address this challenge. However, we argue that these surrogate tasks are sub-optimal solutions, inconsistent with the essence of anomaly detection. In this paper, we propose a novel approach for UVAD that directly detects anomalies based on similarities between events in videos. Our method generates representations for events while simultaneously capturing prototypical normality patterns, and detects anomalies based on whether an event’s representation matches the captured patterns. The proposed model comprises a memory module to capture normality patterns, and a representation learning network to obtain representations matching the memory module for normal events. A pseudo-label generation module as well as an anomalous event generation module for negative learning are further designed to assist the model to work under the strictly unsupervised setting. Experimental results demonstrate that the proposed method outperforms existing UVAD methods and achieves competitive performance compared with classic VAD methods.
What problem does this paper attempt to address?