Fusing Crops Representation into Snippet Via Mutual Learning for Weakly Supervised Surveillance Anomaly Detection

Bohua Zhang,Jianru Xue
DOI: https://doi.org/10.1049/cvi2.12289
IF: 1.484
2024-01-01
IET Computer Vision
Abstract:In recent years, the challenge of detecting anomalies in real-world surveillance videos using weakly supervised data has emerged. Traditional methods, utilising multi-instance learning (MIL) with video snippets, struggle with background noise and tend to overlook subtle anomalies. To tackle this, the authors propose a novel approach that crops snippets to create multiple instances with less noise, separately evaluates them and then fuses these evaluations for more precise anomaly detection. This method, however, leads to higher computational demands, especially during inference. Addressing this, our solution employs mutual learning to guide snippet feature training using these low-noise crops. The authors integrate multiple instance learning (MIL) for the primary task with snippets as inputs and multiple-multiple instance learning (MMIL) for an auxiliary task with crops during training. The authors' approach ensures consistent multi-instance results in both tasks and incorporates a temporal activation mutual learning module (TAML) for aligning temporal anomaly activations between snippets and crops, improving the overall quality of snippet representations. Additionally, a snippet feature discrimination enhancement module (SFDE) refines the snippet features further. Tested across various datasets, the authors' method shows remarkable performance, notably achieving a frame-level AUC of 85.78% on the UCF-Crime dataset, while reducing computational costs. Detecting anomalies in surveillance videos with weakly supervised data is challenging due to background noise in snippets. The authors' approach uses cropped snippets and mutual learning to improve detection and reduce computation at inference. Best result achieves 85.78% frame-level AUC on the UCF-Crime dataset with fewer instances. image
What problem does this paper attempt to address?