Abstract:The purpose of this paper is to present an unsupervised video anomaly detection method using Optical Flow decomposition and Spatio-Temporal feature learning (OFST). This method employs a combination of optical flow reconstruction and video frame prediction to achieve satisfactory results. The proposed OFST framework is composed of two modules: the Multi-Granularity Memory-augmented Autoencoder with Optical Flow Decomposition (MG-MemAE-OFD) and a Two-Stream Network based on Spatio-Temporal feature learning (TSN-ST). The MG-MemAE-OFD module is composed of three functional blocks: optical flow decomposition, autoencoder, and multi-granularity memory networks. The optical flow decomposition block is used to extract the main motion information of objects in optical flow, and the granularity memory network is utilized to memorize normal patterns and improve the quality of the reconstructions. To predict video frames, we introduce a two-stream network based on spatiotemporal feature learning (TSN-ST), which adopts parallel standard Transformer blocks and a temporal block to learn spatiotemporal features from video frames and optical flows. The OFST combines these two modules so that the prediction error of abnormal samples is further increased due to the larger reconstruction error. In contrast, the normal samples obtain a lower reconstruction error and prediction error. Therefore, the anomaly detection capability of the method is greatly enhanced. Our proposed model was evaluated on public datasets. Specifically, in terms of the area under the curve (AUC), our model achieved an accuracy of 85.74% on the Ped1 dataset, 99.62% on the Ped2 dataset, 93.89% on the Avenue dataset, and 76.0% on the ShanghaiTech Dataset. Our experimental results show an average improvement of 1.2% compared to the current state-of-the-art.

Spatial-Temporal Graph Convolutional Network Boosted Flow-Frame Prediction for Video Anomaly Detection

Memory-Augmented Spatial-Temporal Consistency Network for Video Anomaly Detection.

Stochastic video normality network for abnormal event detection in surveillance videos

Video Anomaly Detection Based on Global–Local Convolutional Autoencoder

A Novel Unsupervised Video Anomaly Detection Framework Based on Optical Flow Reconstruction and Erased Frame Prediction

Tam-Net: Temporal Enhanced Appearance-To-Motion Generative Network For Video Anomaly Detection

Multi-scale Spatial-temporal Interaction Network for Video Anomaly Detection

Learning Appearance-motion Normality for Video Anomaly Detection.

Integrated Multiscale Appearance Features and Motion Information Prediction Network for Anomaly Detection

DA-Flow: Dual Attention Normalizing Flow for Skeleton-based Video Anomaly Detection

Video Anomaly Detection Via Successive Image Frame Prediction Leveraging Optical Flows

Rethinking Prediction-Based Video Anomaly Detection from Local-Global Normality Perspective

Spatiotemporal consistency-enhanced network for video anomaly detection

Memory Enhanced Spatial-Temporal Graph Convolutional Autoencoder for Human-Related Video Anomaly Detection.

An unsupervised video anomaly detection method via Optical Flow decomposition and Spatio-Temporal feature learning

Adaptive Graph Convolutional Networks for Weakly Supervised Anomaly Detection in Videos

Configurable Spatial-Temporal Hierarchical Analysis for Flexible Video Anomaly Detection

A novel spatio-temporal memory network for video anomaly detection

Video Anomaly Detection and Localization Based on an Adaptive Intra-Frame Classification Network

Learning Attention Augmented Spatial-temporal Normality for Video Anomaly Detection