Multiscale Recovery Diffusion Model with Unsupervised Learning for Video Anomaly Detection System
Bo Li,Hongwei Ge,Yuxuan Liu,Guozhi Tang
DOI: https://doi.org/10.1109/tii.2024.3493390
IF: 12.3
2024-01-01
IEEE Transactions on Industrial Informatics
Abstract:The rapid development of intelligent industry and smart city increases the number of surveillance devices, greatly enhancing the need for unsupervised automatic anomaly detection in real-time video surveillance, which uses raw data without laborious manual annotations. Existing video anomaly detection (VAD) methods encounter limitations when utilizing pretext tasks, such as reconstruction or prediction to identify abnormal events, as these tasks are not completely consistent and complementary with the essential objective of anomaly detection. Motivated by recent advances in diffusion models, we propose a multiscale recovery diffusion model, which relies on the proposed novel and effective pretext task named recovery to introduce the notion of generation speed. It utilizes critical step-by-step generation of diffusion probabilistic models in unsupervised anomaly detection scenarios. By incorporating a proposed multiscale spatial-temporal subtraction module, our model captures more detailed appearance and motion information of foreground objects without relying on other high-level pretrained models. Furthermore, an innovative push–pull loss further extends the disparity between normal and abnormal events through pseudolabels. We validate our model on five established benchmarks: UCSD Ped1, UCSD Ped2, CUHK Avenue, ShanghaiTech, and UCF-Crime, achieving frame-level area under the curves of 86.01%, 99.23%, 92.35%, 82.49%, and 74.79%, respectively, surpassing other state-of-the-art unsupervised VAD methods.