Attention-Driven Loss for Anomaly Detection in Video Surveillance

Joey Tianyi Zhou,Le Zhang,Zhiwen Fang,Jiawei Du,Xi Peng,Yang Xiao
DOI: https://doi.org/10.1109/tcsvt.2019.2962229
IF: 5.859
2020-12-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Recent video anomaly detection methods focus on reconstructing or predicting frames. Under this umbrella, the long-standing inter-class data-imbalance problem resorts to the imbalance between foreground and stationary background objects in video anomaly detection and this has been less investigated by existing solutions. Naively optimizing the reconstructing loss yields a biased optimization towards background reconstruction rather than the objects of interest in the foreground. To solve this, we proposed a simple yet effective solution, termed attention-driven loss to alleviate the foreground-background imbalance problem in anomaly detection. Specifically, we compute a single mask map that summarizes the frame evolution of moving foreground regions and suppresses the background in the training video clips. After that, we construct an attention map through the combination of the mask map and background to give different weights to the foreground and background region respectively. The proposed attention-driven loss is independent of backbone networks and can be easily augmented in most existing anomaly detection models. Augmented with attention-driven loss, the model is able to achieve AUC 86.0% on Avenue, 83.9% on Ped1, 96% on Ped2 datasets. Extensive experimental results and ablation studies further validate the effectiveness of our model.
engineering, electrical & electronic
What problem does this paper attempt to address?
This paper attempts to solve the problem of anomaly detection in video surveillance, especially the imbalance between foreground and background in video data. In existing video anomaly detection methods, video frames are usually reconstructed or predicted. However, these methods often overlook the data imbalance between foreground objects (such as moving people or objects) and the static background. This imbalance can lead to a bias towards background reconstruction rather than the objects in the foreground during the optimization process, thus affecting the effectiveness of anomaly detection. To solve this problem, the authors propose a simple and effective method called Attention - Driven Loss. Specifically, they summarize the frame evolution of the moving foreground area by calculating a single mask map and suppress the background in the training video clips. Then, by combining the mask map and the background to construct an attention map, different weights are given to the foreground and background areas respectively. In this way, the model can pay more attention to the objects in the foreground during the optimization process, thereby improving the accuracy of anomaly detection. This method is independent of the backbone network and can be easily integrated into most existing anomaly detection models. Experimental results show that after using Attention - Driven Loss, the performance of the model on datasets such as Avenue, Ped1, and Ped2 has been significantly improved, with AUC reaching 86.0%, 83.9% and 96% respectively. A large number of experimental results and ablation studies further verify the effectiveness of this method.