Abstract:Detecting violence in various scenarios is a difficult task that requires a high degree of generalisation. This includes fights in different environments such as schools, streets, and football stadiums. However, most current research on violence detection focuses on a single scenario, limiting its ability to generalise across multiple scenarios. To tackle this issue, this paper offers a new multi-scenario violence detection framework that operates in two environments: fighting in various locations and rugby stadiums. This framework has three main steps. Firstly, it uses transfer learning by employing three pre-trained models from the ImageNet dataset: Xception, Inception, and InceptionResNet. This approach enhances generalisation and prevents overfitting, as these models have already learned valuable features from a large and diverse dataset. Secondly, the framework combines features extracted from the three models through feature fusion, which improves feature representation and enhances performance. Lastly, the concatenation step combines the features of the first violence scenario with the second scenario to train a machine learning classifier, enabling the classifier to generalise across both scenarios. This concatenation framework is highly flexible, as it can incorporate multiple violence scenarios without requiring training from scratch with additional scenarios. The Fusion model, which incorporates feature fusion from multiple models, obtained an accuracy of 97.66% on the RLVS dataset and 92.89% on the Hockey dataset. The Concatenation model accomplished an accuracy of 97.64% on the RLVS and 92.41% on the Hockey datasets with just a single classifier. This is the first framework that allows for the classification of multiple violent scenarios within a single classifier. Furthermore, this framework is not limited to violence detection and can be adapted to different tasks.

Fudan-NJUST at MediaEval 2014: Violent Scenes Detection Using Deep Neural Networks.

Fudan-Huawei at MediaEval 2015: Detecting Violent Scenes and Affective Impact in Movies with Deep Learning.

Fudan at MediaEval 2013: Violent Scenes Detection Using Motion Features and Part-Level Attributes.

The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Trajectory-based Features.

Detecting Violence in Video Based on Deep Features Fusion Technique

Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision

Detecting Violence in Video using Subclasses

Violent Video Detection Based on Semantic Correspondence.

Benchmarking Violent Scenes Detection in Movies.

Violent Interaction Detection in Video Based on Deep Learning

Mobile Neural Architecture Search Network and Convolutional Long Short-Term Memory-Based Deep Features Toward Detecting Violence from Video

Look, Listen and Pay More Attention: Fusing Multi-Modal Information for Video Violence Detection

Semantic Multimodal Violence Detection Based on Local-to-global Embedding

Novel Deep Feature Fusion Framework for Multi-Scenario Violence Detection

Look and Listen: A Multi-modality Late Fusion Approach to Scene Classification for Autonomous Machines

MMANN: Multimodal Multilevel Attention Neural Network for Horror Clip Detection

Detecting Violent Scenes in Movies by Auditory and Visual Cues.

Video Dynamics Detection Using Deep Neural Networks

Exploring Inter-feature and Inter-class Relationships with Deep Neural Networks for Video Classification

Violence detection and face recognition based on deep learning

Audiovisual Dependency Attention for Violence Detection in Videos