Abstract:Multi-task learning based video anomaly detection methods combine multiple proxy tasks in different branches to detect video anomalies in different situations. Most existing methods either do not combine complementary tasks to effectively cover all motion patterns, or the class of the objects is not explicitly considered. To address the aforementioned shortcomings, we propose a novel multi-task learning based method that combines complementary proxy tasks to better consider the motion and appearance features. We combine the semantic segmentation and future frame prediction tasks in a single branch to learn the object class and consistent motion patterns, and to detect respective anomalies simultaneously. In the second branch, we added several attention mechanisms to detect motion anomalies with attention to object parts, the direction of motion, and the distance of the objects from the camera. Our qualitative results show that the proposed method considers the object class effectively and learns motion with attention to the aforementioned important factors which results in a precise motion modeling and a better motion anomaly detection. Additionally, quantitative results show the superiority of our method compared with state-of-the-art methods.

What problem does this paper attempt to address?

This paper attempts to solve several key problems in video anomaly detection. Specifically, the existing multi - task learning methods have the following deficiencies when dealing with video anomaly detection: 1. **The combination of proxy tasks is not complementary enough and lacks interpretability**: In existing methods, the combination of different proxy tasks is often not complementary and difficult to interpret. 2. **Failure to effectively consider object categories**: Most methods do not fully consider the impact of object categories on anomaly detection. 3. **Not covering all motion anomaly situations**: Existing methods fail to comprehensively cover various motion anomaly situations. 4. **Context information is not fully utilized**: During the anomaly detection process, context information (such as object parts, motion directions, and distances) is not fully utilized. To solve these problems, the author proposes a new video - anomaly - detection method based on multi - task learning. This method combines three complementary proxy tasks to more comprehensively consider appearance and motion features, thereby improving the accuracy of anomaly detection. The following are the main contributions of this method: - **Proposing a new multi - task learning framework**: This framework combines three proxy tasks, namely "future frame prediction", "semantic segmentation", and "optical - flow - magnitude prediction", to more comprehensively consider appearance and motion features. - **Introducing the future semantic - segmentation - prediction task**: Combining the semantic - segmentation and future - frame - prediction tasks to form a new task - future semantic - segmentation - prediction, which is used to detect appearance and motion anomalies simultaneously. - **Designing a new attention mechanism**: By introducing spatial and channel - attention networks and a new attention network, the model can more accurately estimate the motion magnitudes of objects and consider factors such as object parts, motion directions, and distances. ### Formula Representation To ensure the correctness and readability of formulas, the following are some of the formulas involved in the paper represented in Markdown format: 1. **Calculating the direction and magnitude of optical flow**: \[ \text{Mag}, \text{Ang} = \text{OF}(I_{t - 1}, I_t) \] Here, \(\text{Mag}\) represents the magnitude of the optical flow, and \(\text{Ang}\) represents the angle of motion relative to the horizontal axis. 2. **Calculating motion - direction features**: \[ X = |\cos(\text{Ang})| \] \[ Y = |\sin(\text{Ang})| \] 3. **Calculating the anomaly score**: \[ S(t)=\sum|\text{Out}_{\text{student}}(I_t)-\text{Out}_{\text{teacher}}(I_t)| \] Here, \(\text{Out}_{\text{student}}(I_t)\) and \(\text{Out}_{\text{teacher}}(I_t)\) respectively represent the outputs of the student network and the teacher network, and the summation is carried out over all pixels in the anomaly map. 4. **Applying the Savitzky - Golay filter for temporal denoising**: \[ S_r(t)=\frac{1}{N}\sum_{i = - w}^{w}\alpha S(t + i) \] Here, \(S_r(t)\) represents the denoised anomaly score, \(N\) is the normalization factor, and \(\alpha\) and \(w\) are the convolution coefficient and window size respectively. Through these improvements, this method can more accurately identify abnormal events in video - anomaly detection, especially performing better in complex scenarios.

Multi-Task Learning based Video Anomaly Detection with Attention

Attention-based anomaly detection in multi-view surveillance videos

Object-Guided and Motion-Refined Attention Network for Video Anomaly Detection

Future Video Prediction from a Single Frame for Video Anomaly Detection

Learning Task-Specific Representation for Video Anomaly Detection with Spatial-Temporal Attention

Object-based video anomaly detection using multi-attention and adaptive velocity attribute representation learning

Multi-Scale Video Anomaly Detection by Multi-Grained Spatio-Temporal Representation Learning

Video Anomaly Detection Based on Attention Mechanism

Attention-Driven Loss for Anomaly Detection in Video Surveillance

Learning Attention Augmented Spatial-temporal Normality for Video Anomaly Detection

Video anomaly detection based on attention and efficient spatio-temporal feature extraction

Video Anomaly Detection Via Progressive Learning of Multiple Proxy Tasks

Efficient Anomaly Detection Using Self-Supervised Multi-Cue Tasks

Sequential Attention Mechanism for Weakly Supervised Video Anomaly Detection

Contrastive Attention for Video Anomaly Detection

Multi-Channel Generative Framework and Supervised Learning for Anomaly Detection in Surveillance Videos

Anomalies cannot materialize or vanish out of thin air: A hierarchical multiple instance learning with position-scale awareness for video anomaly detection

Video Anomaly Detection Based on Spatio-Temporal Relationships among Objects

Multi-Scale Temporal Relations and Segmented Channel Attention for Video Anomaly Detection

Dual contrast discriminator with sharing attention for video anomaly detection

Probabilistic Multi-Task Learning for Visual Saliency Estimation in Video