Abstract:Violent behavior detection (VioBD), as a special action recognition task, aims to detect violent behaviors in videos, such as mutual fighting and assault. Some progress has been made in the research of violence detection, but the existing methods have poor real-time performance and the algorithm performance is limited by the interference of complex backgrounds and the occlusion of dense crowds. To solve the above problems, we propose an end-to-end real-time violence detection framework based on 2D CNNs. First, we propose a lightweight skeletal image (SI) as the input modality, which can obtain the human body posture information and richer contextual information, and at the same time remove the background interference. As tested, at the same accuracy, the resolution of SI modality is only one-third of that of RGB modality, which greatly improves the real-time performance of model training and inference, and at the same resolution, SI modality has higher inaccuracy. Second, we also design a parallel prediction module (PPM), which can simultaneously obtain the single image detection results and the inter-frame motion information of the video, which can improve the real-time performance of the algorithm compared with the traditional "detect the image first, understand the video later" mode. In addition, we propose an auxiliary parameter generation module (APGM) with both efficiency and accuracy, APGM is a 2D CNNs-based video understanding module for weighting the spatial information of the video features, processing speed can reach 30–40 frames per second, and compared with models such as CNN-LSTM (Iqrar et al., Aamir: Cnn-lstm based smart real-time video surveillance system. In: 2022 14th International Conference on Mathematics, Actuarial, Science, Computer Science and Statistics (MACS), pages 1–5. IEEE, 2022) and Ludl et al. (Cristóbal: Simple yet efficient real-time pose-based action recognition. In: 2019 IEEE Intelligent Transportation Systems Conference (ITSC), pages 581–588. IEEE, 1999), the propagation effect speed can be increased by an average of frames per second per group of clips, which further improves the video motion detection efficiency and accuracy, greatly improving real-time performance. We conducted experiments on some challenging benchmarks, and RVBDN can maintain excellent speed and accuracy in long-term interactions, and are able to meet real-time requirements in methods for violence detection and spatio-temporal action detection. Finally, we update our proposed new dataset on violence detection images (violence image dataset). Dataset is available at https://github.com/ChinaZhangPeng/Violence-Image-Dataset

Human skeletons and change detection for efficient violence detection in surveillance videos

Efficient Human Violence Recognition for Surveillance in Real Time

Toward Fast and Accurate Violence Detection for Automated Video Surveillance Applications

ESTS‐GCN: An Ensemble Spatial–Temporal Skeleton‐Based Graph Convolutional Networks for Violence Detection

Mobile Neural Architecture Search Network and Convolutional Long Short-Term Memory-Based Deep Features Toward Detecting Violence from Video

A Skeleton-based Approach for Campus Violence Detection

Crime scene classification from skeletal trajectory analysis in surveillance settings

A Frame-Based Feature Model for Violence Detection from Surveillance Cameras Using ConvLSTM Network

Efficient Violence Detection in Surveillance

Robust Activity Recognition Based on Human Skeleton for Video Surveillance

An ensemble based approach for violence detection in videos using deep transfer learning

A real time crime scene intelligent video surveillance systems in violence detection framework using deep learning techniques

DIFEM: Key-points Interaction based Feature Extraction Module for Violence Recognition in Videos

Detecting Violence in Video Based on Deep Features Fusion Technique

Towards Real-world Violence Recognition via Efficient Deep Features and Sequential Patterns Analysis

Efficiently adapting large pre-trained models for real-time violence recognition in smart city surveillance

An end-to-end framework for real-time violent behavior detection based on 2D CNNs

Improving Video Violence Recognition with Human Interaction Learning on 3D Skeleton Point Clouds

Efficient Two-Stream Network for Violence Detection Using Separable Convolutional LSTM

Violence detection in videos using deep recurrent and convolutional neural networks

Two-stream Multi-dimensional Convolutional Network for Real-time Violence Detection