Abstract:The field of object detection has widespread applicability in many areas. Despite the multitude of object detection methods that are already established, complex scenes with occlusions still prove challenging due to the loss of information and dynamic changes that reduce the distinguishable features between the target and its background, resulting in lower detection accuracy. Addressing the shortcomings in detecting obscured objects in complex scenes with existing models, a novel approach has been proposed on the YOLOv8n architecture. First, the enhancement begins with the addition of a small object detection head atop the YOLOv8n architecture to keenly detect and pinpoint small objects. Then, a blended mixed local channel attention mechanism is integrated within YOLOv8n, which leverages the visible segment features of the target to refine the feature extraction hampered by occlusion impacts. Subsequently, Soft-NMS is introduced to optimize the candidate bounding boxes, solving the issue of missed detection under overlapping similar targets. Lastly, using universal object detection evaluation metrics, a series of ablation experiments on public datasets (CityPersons) were conducted alongside comparison trials with other models, followed by testing on various datasets. The results showed an average precision (map@0.5) reaching 0.676, marking a 6.7% improvement over the official YOLOv8 under identical experimental conditions, a 7.9% increase compared to Gold-YOLO, and a 7.1% rise over RTDETR, also demonstrating commendable performance across other datasets. Although the computational load increased with the addition of detection layers, the frames per second (FPS) still reached 192, which meets the real-time requirements for the vast majority of scenarios. Such findings indicate that the refined method not only significantly enhances performance on occluded datasets but can also be transferred to other models to boost their performance capabilities.

Pose Detection in Complex Classroom Environment Based on Improved Faster R-CNN.

Classroom Student Posture Recognition Based on an Improved High-Resolution Network.

Object Detection Based on Faster R-CNN Algorithm with Skip Pooling and Fusion of Contextual Information

Research on Low-Resolution Pedestrian Detection Algorithms Based on R-CNN with Targeted Pooling and Proposal

Multi-Object Detection Based On Deep Learning In Real Classrooms

Pose Estimation for Swimmers in Video Surveillance

View Invariant Human Body Detection and Pose Estimation from Multiple Depth Sensors

Improved YOLO-Pose Crowd Pose Estimation.

RFA-YOLO-POSE: A Fusion Algorithm for Pose Detection and Object Identification Amidst Complex Crowds

CC-PoseNet: Towards Human Pose Estimation in Crowded Classrooms

Human detection in dense scene of classrooms

A Feature-Optimized Faster Regional Convolutional Neural Network for Complex Background Objects Detection

RFF-PoseNet: A 6D Object Pose Estimation Network Based on Robust Feature Fusion in Complex Scenes

RNNPose: 6-DoF Object Pose Estimation Via Recurrent Correspondence Field Estimation and Pose Optimization

A Small Object Detection Algorithm Based on Improved Faster RCNN

Object Detection via Aspect Ratio and Context Aware Region-based Convolutional Networks

An Improved Faster R-CNN for Small Object Detection

Complex Scene Occluded Object Detection with Fusion of Mixed Local Channel Attention and Multi-Detection Layer Anchor-Free Optimization

Simultaneous Face Detection And Head Pose Estimation: A Fast And Unified Framework

Classroom Behavior Detection Based on Improved YOLOv5 Algorithm Combining Multi-Scale Feature Fusion and Attention Mechanism

Object identification and pose detection based on convolutional neural network