Abstract:Traditional methods of violence detection in public spaces often struggle with low accuracy, limited real-time capabilities, and an inability to handle complex spatiotemporal patterns. They lack the sophistication needed to accurately distinguish between violent and non-violent activities, and their reliance on rule-based systems hinders adaptability to diverse scenarios. Moreover, their communication channels for alerts might be slow and inefficient. Mitigating the pervasive issue of violence within public spaces demands a technologically advanced approach. Addressing this imperative, we present a novel solution encompassing a profound neural network architecture. Our method harmoniously integrates a pre-trained Darknet19 model with both Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) models, collectively orchestrated to achieve unprecedented efficacy in violence detection and prevention. Our approach commences with the extraction of spatial intricacies, meticulously executed by leveraging the potent capabilities of the Darknet19 model. Subsequently, these extracted spatial features serve as the foundational dataset for training the CNN, which in turn captures and distills essential temporal attributes inherent to the video sequences. These temporal features are then seamlessly channeled into the LSTM component of our architecture, which adeptly discerns and categorizes video-based activities into two distinct classes: manifestations of violence and non-violent behaviors. Validation and verification of our proposed model transpire upon the Fight dataset, resulting in a suite of commendable experimental outcomes. The integration of multi-modal alert dissemination mechanisms further enhances our system's efficacy. Notably, pertinent alerts are expeditiously communicated to relevant law enforcement entities through the synergistic utilization of WhatsApp, Telegram, and e-mail applications. This technologically fortified paradigm promises a transformative leap in curbing violence within public domains, empowering law enforcement agencies with real-time, actionable insights. Moreover, the proposed systems have achieved high accuracy rates of 96%, which is higher than the accuracy achieved by other state-of-the-art models.

CrimeNet: Neural Structured Learning using Vision Transformer for violence detection

Transformer and Adaptive Threshold Sliding Window for Improving Violence Detection in Videos

Video Vision Transformers for Violence Detection

Toward Fast and Accurate Violence Detection for Automated Video Surveillance Applications

Mobile Neural Architecture Search Network and Convolutional Long Short-Term Memory-Based Deep Features Toward Detecting Violence from Video

Real time violence detection in surveillance videos using Convolutional Neural Networks

VD-Net: An Edge Vision-Based Surveillance System for Violence Detection

Detecting Violence in Video Based on Deep Features Fusion Technique

A Next-Gen Real-Time Video Alert System with Machine Learning Sensitivity

JOSENet: A Joint Stream Embedding Network for Violence Detection in Surveillance Videos

Conv3D-Based Video Violence Detection Network Using Optical Flow and RGB Data

ESTS‐GCN: An Ensemble Spatial–Temporal Skeleton‐Based Graph Convolutional Networks for Violence Detection

Suspicious activities detection using spatial–temporal features based on vision transformer and recurrent neural network

Efficient Violence Detection in Surveillance

Real-Time Violence Detection Using CNN-LSTM

CUE-Net: Violence Detection Video Analytics with Spatial Cropping, Enhanced UniformerV2 and Modified Efficient Additive Attention

Efficient Human Violence Recognition for Surveillance in Real Time

Suspicious Behavior Detection on Shoplifting Cases for Crime Prevention by Using 3D Convolutional Neural Networks

SSIVD-Net: A Novel Salient Super Image Classification & Detection Technique for Weaponized Violence

An ensemble based approach for violence detection in videos using deep transfer learning

A Frame-Based Feature Model for Violence Detection from Surveillance Cameras Using ConvLSTM Network