Abstract:Violent behavior detection (VioBD), as a special action recognition task, aims to detect violent behaviors in videos, such as mutual fighting and assault. Some progress has been made in the research of violence detection, but the existing methods have poor real-time performance and the algorithm performance is limited by the interference of complex backgrounds and the occlusion of dense crowds. To solve the above problems, we propose an end-to-end real-time violence detection framework based on 2D CNNs. First, we propose a lightweight skeletal image (SI) as the input modality, which can obtain the human body posture information and richer contextual information, and at the same time remove the background interference. As tested, at the same accuracy, the resolution of SI modality is only one-third of that of RGB modality, which greatly improves the real-time performance of model training and inference, and at the same resolution, SI modality has higher inaccuracy. Second, we also design a parallel prediction module (PPM), which can simultaneously obtain the single image detection results and the inter-frame motion information of the video, which can improve the real-time performance of the algorithm compared with the traditional "detect the image first, understand the video later" mode. In addition, we propose an auxiliary parameter generation module (APGM) with both efficiency and accuracy, APGM is a 2D CNNs-based video understanding module for weighting the spatial information of the video features, processing speed can reach 30–40 frames per second, and compared with models such as CNN-LSTM (Iqrar et al., Aamir: Cnn-lstm based smart real-time video surveillance system. In: 2022 14th International Conference on Mathematics, Actuarial, Science, Computer Science and Statistics (MACS), pages 1–5. IEEE, 2022) and Ludl et al. (Cristóbal: Simple yet efficient real-time pose-based action recognition. In: 2019 IEEE Intelligent Transportation Systems Conference (ITSC), pages 581–588. IEEE, 1999), the propagation effect speed can be increased by an average of frames per second per group of clips, which further improves the video motion detection efficiency and accuracy, greatly improving real-time performance. We conducted experiments on some challenging benchmarks, and RVBDN can maintain excellent speed and accuracy in long-term interactions, and are able to meet real-time requirements in methods for violence detection and spatio-temporal action detection. Finally, we update our proposed new dataset on violence detection images (violence image dataset). Dataset is available at https://github.com/ChinaZhangPeng/Violence-Image-Dataset

An end-to-end framework for real-time violent behavior detection based on 2D CNNs

Real-Time Target Detection and Recognition with Deep Convolutional Networks for Intelligent Visual Surveillance

Conv3D-Based Video Violence Detection Network Using Optical Flow and RGB Data

Two-stream Multi-dimensional Convolutional Network for Real-time Violence Detection

A spatio-temporal model for violence detection based on spatial and temporal attention modules and 2D CNNs

An Overview of Violence Detection Techniques: Current Challenges and Future Directions

Towards Real-world Violence Recognition via Efficient Deep Features and Sequential Patterns Analysis

Violent Interaction Detection in Video Based on Deep Learning

Toward Fast and Accurate Violence Detection for Automated Video Surveillance Applications

Mobile Neural Architecture Search Network and Convolutional Long Short-Term Memory-Based Deep Features Toward Detecting Violence from Video

2D bidirectional gated recurrent unit convolutional Neural networks for end-to-end violence detection In videos

VD-Net: An Edge Vision-Based Surveillance System for Violence Detection

Real-Time Violence Detection Using CNN-LSTM

Efficiently adapting large pre-trained models for real-time violence recognition in smart city surveillance

A Frame-Based Feature Model for Violence Detection from Surveillance Cameras Using ConvLSTM Network

Utilizing Deep Learning Models to Develop a Human Behavior Recognition System for Vision-Based School Violence Detection

Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision

Real time violence detection in surveillance videos using Convolutional Neural Networks

A CNN-RNN Combined Structure for Real-World Violence Detection in Surveillance Cameras

A Next-Gen Real-Time Video Alert System with Machine Learning Sensitivity

RWF-2000: An Open Large Scale Video Database for Violence Detection