Human-Scene Network: A novel baseline with self-rectifying loss for weakly supervised video anomaly detection

Snehashis Majhi,Rui Dai,Quan Kong,Lorenzo Garattoni,Gianpiero Francesca,François Brémond
DOI: https://doi.org/10.1016/j.cviu.2024.103955
IF: 4.886
2024-02-17
Computer Vision and Image Understanding
Abstract:Video anomaly detection in surveillance systems with only video-level labels ( i.e. weakly supervised ) is challenging. This is due to (i) the complex integration of a large variety of scenarios including human and scene-based anomalies characterized by subtle or sharp spatio-temporal cues in real-world videos and (ii) non-optimal optimization between normal and anomaly instances under weak supervision. In this paper, we propose a Human-Scene Network to learn discriminative representations by capturing both subtle and strong cues in a dissociative manner. In addition, a self-rectifying loss is proposed that dynamically computes the pseudo-temporal annotations from video-level labels for optimizing the Human-Scene Network effectively. The proposed Human-Scene Network optimized with self-rectifying loss is validated on three publicly available datasets i.e. UCF-Crime, ShanghaiTech, and IITB-Corridor, outperforming recently reported state-of-the-art approaches on five out of the six scenarios considered.
computer science, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?