Video Anomaly Detection via Visual Cloze Tests
Guang Yu,Siqi Wang,Zhiping Cai,Xinwang Liu,En Zhu,Jianping Yin
DOI: https://doi.org/10.1109/tifs.2023.3300094
IF: 7.231
2023-01-01
IEEE Transactions on Information Forensics and Security
Abstract:Although great progress has been sparked in video anomaly detection (VAD) by deep neural networks (DNNs), existing solutions still fall short in two aspects: 1) The extraction of video events cannot be both precise and comprehensive. 2) The semantics and temporal context are under-explored. To tackle above issues, we are inspired by cloze tests in language education and propose a novel approach named Visual Cloze Completion (VCC), which conducts VAD by completing visual cloze tests (VCTs). Specifically, VCC first localizes each video event and encloses it into a spatio-temporal cube (STC). To realize both precise and comprehensive event extraction, appearance and motion are used as complementary cues to mark the object region associated with each event. For each marked region, a normalized patch sequence is extracted from several neighboring frames and stacked into a STC. With each patch and the patch sequence of a STC regarded as a visual “word” and “sentence” respectively, we deliberately erase a certain “word” (patch) to yield a VCT. Then, the VCT is completed by training DNNs to infer the erased patch and its optical flow via video semantics. Meanwhile, VCC fully exploits temporal context by alternatively erasing each patch in temporal context and creating multiple VCTs. Furthermore, we propose localization-level, event-level, model-level and decision-level solutions to enhance VCC, which can further exploit VCC’s potential and produce significant VAD performance improvement. Extensive experiments demonstrate that VCC achieves highly competitive VAD performance.
computer science, theory & methods,engineering, electrical & electronic