Utilizing Deep Learning Models to Develop a Human Behavior Recognition System for Vision-Based School Violence Detection

Thanh Phat Pham,Viet Cuong Pham,Son Phuc Phan,Huy Hieu Vu,Tan Trinh Nguyen
DOI: https://doi.org/10.1109/GTSD62346.2024.10674972
2024-07-25
Abstract:School violence is a worrying problem in every school around the world and is becoming more and more complicated with many unfortunate incidents occurring on school campuses. This study aims to validate the effectiveness of the system for detecting violent acts in school environments. The dataset for training includes 48 videos with full HD resolution (1080xI920), frame rate 30 fps and different lengths simulating violence that we made and 11 videos about bully activities and normal activities from the internet. Besides that, the dataset for testing includes 3 videos we made and 7 video from the Internet. The system detects school violence by classifying the behavior of everyone appearing in the video as bullies, victims, and outsiders. The YOWOv2 model [1], including two branches, 3D-CNN using ResNextl0l, 2D-CNN using Backbone of YOLOv7 and channel fusion module and attention mechanism (CFAM) block, is transfer learned to solve the problem. When validating, the best model achieved a mean average precision (mAP) of 75.7355%. The AP for the “bully” class is 77.1823%, for the “victim” class is 78.22004 %, for the “outsider” class is 71.8043%. When testing, the best model achieved 65.2007% for mAP, 63.337% AP for “bully” class, 78.93005% AP for “outsider” and 53.3352 % AP for “victim” class. This model can be improved and developed into an automatic assessment method to detect violent acts in the school environment in real-time to meet actual monitoring needs.
Computer Science,Education
What problem does this paper attempt to address?