Enhancing video anomaly detection with learnable memory network: A new approach to memory-based auto-encoders

Zhiqiang Wang,Xiaojing Gu,Xingsheng Gu,Jingyu Hu
DOI: https://doi.org/10.1016/j.cviu.2024.103946
IF: 4.886
2024-02-03
Computer Vision and Image Understanding
Abstract:The aim of video anomaly detection is to detect anomalous events in a video sequence. In an unsupervised setting, enhancing detection accuracy hinges on the ability to learn normal features during the training phase and subsequently generate large errors when abnormal video frames are encountered during the testing phase. The transformer is an innovative neural network that utilizes a self-attention mechanism to extract intrinsic features, thereby proving more effective in extracting normal features. When paired with convolutional neural networks (CNNs), known for their proficiency in local information extraction, this hybrid architecture becomes particularly adept at handling numerous vision tasks. However, research exploring the full potential of such a hybrid architecture network for video anomaly detection is still in its early stages. In this paper, we introduce a novel approach to integrating transformers and CNNs for video anomaly detection. Here, the transformer functions as a memory module (TransMem) that processes latent features and incorporates them into CNN-based autoencoders (AEs). This approach significantly reduces computational complexity compared to directly processing video frames. Moreover, unlike other similarity-based memory methods, the proposed memory module is learnable. TransMem is a lightweight, plug-and-play module that can be seamlessly integrated into other complex frameworks to further enhance detection accuracy. Extensive experiments have demonstrated the effectiveness of our proposed method.
computer science, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?