Abstract:Surveillance video analysis using automated AI-based techniques is a prominent research field with real-world applications. Several techniques aiming at recognizing activities, behaviour, and violent actions are present in literature. Real-world data analysis for violence detection is still a major challenge due to the limited datasets available for training with complex scenarios and varied scaling of objects performing different activities. In this paper, we present a novel deep learning model considering specially designed frame encoders for spatial feature extraction that are generalized towards many challenges, such as light conditions and indoor and outdoor scenarios. Furthermore, the spatial features in the stacked form are analyzed using a temporal deep learning model to observe and learn the temporal patterns dependencies by considering past and future information while predicting the violent or normal class. In the literature, violence is considered a binary classification of either fight or no fight. Different from these techniques, we present a multi-class classification of violent activities considering different types of human violence, such as assault, shooting, etc. The data for multi-class violence classification is extracted from the famous real-world anomaly detection UCF-crime dataset, where only six human-involved types of anomalies are considered for preparing training and testing sets. We report 41.1%, 48.85% accuracy for 16 and 8 frames in a single sequence. This data shows that more research is demanded to increase the robustness of deep models in violence detection. Finally, the proposed results against recent methods over violence detection datasets are marginally better, indicating the effectiveness of the proposed feature extraction and temporal learning mechanism.

DIFEM: Key-points Interaction based Feature Extraction Module for Violence Recognition in Videos

Detecting Violence in Video Based on Deep Features Fusion Technique

Mobile Neural Architecture Search Network and Convolutional Long Short-Term Memory-Based Deep Features Toward Detecting Violence from Video

Efficient Human Violence Recognition for Surveillance in Real Time

Feature Fusion Based Deep Spatiotemporal Model For Violence Detection In Videos

Toward Fast and Accurate Violence Detection for Automated Video Surveillance Applications

Violence detection in surveillance video using low-level features

A real time crime scene intelligent video surveillance systems in violence detection framework using deep learning techniques

Multi-frame Feature-Fusion-based Model for Violence Detection.

An ensemble based approach for violence detection in videos using deep transfer learning

A SlowFast-Based Violence Recognition Method

Human skeletons and change detection for efficient violence detection in surveillance videos

Violence Detection Using Oriented VIolent Flows

DeepSafety: Multi-level Audio-Text Feature Extraction and Fusion Approach for Violence Detection in Conversations

Recognizing Violent Activity Without Decoding Video Streams

Violent Interaction Detection in Video Based on Deep Learning

Violence detection and face recognition based on deep learning

Towards Real-world Violence Recognition via Efficient Deep Features and Sequential Patterns Analysis

Efficient Violence Detection in Surveillance

A Skeleton-based Approach for Campus Violence Detection

Conv3D-Based Video Violence Detection Network Using Optical Flow and RGB Data