Abstract:Detecting violence in various scenarios is a difficult task that requires a high degree of generalisation. This includes fights in different environments such as schools, streets, and football stadiums. However, most current research on violence detection focuses on a single scenario, limiting its ability to generalise across multiple scenarios. To tackle this issue, this paper offers a new multi-scenario violence detection framework that operates in two environments: fighting in various locations and rugby stadiums. This framework has three main steps. Firstly, it uses transfer learning by employing three pre-trained models from the ImageNet dataset: Xception, Inception, and InceptionResNet. This approach enhances generalisation and prevents overfitting, as these models have already learned valuable features from a large and diverse dataset. Secondly, the framework combines features extracted from the three models through feature fusion, which improves feature representation and enhances performance. Lastly, the concatenation step combines the features of the first violence scenario with the second scenario to train a machine learning classifier, enabling the classifier to generalise across both scenarios. This concatenation framework is highly flexible, as it can incorporate multiple violence scenarios without requiring training from scratch with additional scenarios. The Fusion model, which incorporates feature fusion from multiple models, obtained an accuracy of 97.66% on the RLVS dataset and 92.89% on the Hockey dataset. The Concatenation model accomplished an accuracy of 97.64% on the RLVS and 92.41% on the Hockey datasets with just a single classifier. This is the first framework that allows for the classification of multiple violent scenarios within a single classifier. Furthermore, this framework is not limited to violence detection and can be adapted to different tasks.

A Frame-Based Feature Model for Violence Detection from Surveillance Cameras Using ConvLSTM Network

Mobile Neural Architecture Search Network and Convolutional Long Short-Term Memory-Based Deep Features Toward Detecting Violence from Video

Detecting Violence in Video Based on Deep Features Fusion Technique

Efficient Two-Stream Network for Violence Detection Using Separable Convolutional LSTM

Towards Real-world Violence Recognition via Efficient Deep Features and Sequential Patterns Analysis

Toward Fast and Accurate Violence Detection for Automated Video Surveillance Applications

Efficient Violence Detection in Surveillance

Conv3D-Based Video Violence Detection Network Using Optical Flow and RGB Data

VD-Net: An Edge Vision-Based Surveillance System for Violence Detection

Learning to Detect Violent Videos using Convolutional Long Short-Term Memory

A CNN-RNN Combined Structure for Real-World Violence Detection in Surveillance Cameras

Violence detection and face recognition based on deep learning

ESTS‐GCN: An Ensemble Spatial–Temporal Skeleton‐Based Graph Convolutional Networks for Violence Detection

Violence detection in videos using deep recurrent and convolutional neural networks

Violent Interaction Detection in Video Based on Deep Learning

Violence detection in surveillance video using low-level features

Efficient Human Violence Recognition for Surveillance in Real Time

Novel Deep Feature Fusion Framework for Multi-Scenario Violence Detection

Two-stream Multi-dimensional Convolutional Network for Real-time Violence Detection

A real time crime scene intelligent video surveillance systems in violence detection framework using deep learning techniques

Efficiently adapting large pre-trained models for real-time violence recognition in smart city surveillance