Abstract:Crowded object detection under the heavy traffic environment is always a challenging task in the field of autonomous driving and robotics, because the dense gathering of vehicles or pedestrians inevitably bring heavy occlusion. It is difficult to distinguish highly overlapped objects and predict their bounding boxes accurately, especially for small objects far down the road. To address this challenge, this paper proposes an improved YOLOv5s network integrating a multi-scale feature fusion module with attention mechanism for crowded road object detection task. Specifically, to enhance the multi-scale representation of semantic features and to model the object scale variation flexibly, we introduce an attention-guided pyramid feature fusion strategy into the YOLOv5s backbone network. Then a C3CA module is designed by embedding the coordinate attention (CA) into the concentrated-comprehensive convolution (C3) module of the original YOLOv5s, which can boost the ability of extracting distinguishing features from the overlapped objects. In addition, we add implicit detection heads (IDHs) into the original YOLOv5s's detection head part, which helps the network to learn implicit knowledge and improves the detection accuracy. Finally, a simplified optimal transport assignment (SimOTA) and a bounding box regression loss with dynamic focusing mechanism are used to improve the detector's overall performance. Extensive experiments on the public dataset BDD100K and our self-built crowded road object dataset (XMRD) demonstrate the superiority of our model in crowded road scenarios. The mean average precision (mAP) of our model can achieve 71.2% and 88.2% on the BDD100K and XMRD datasets, respectively, which provides an improvement of +3% over the existing state of the art models.

A method of cross-layer fusion multi-object detection and recognition based on improved faster R-CNN model in complex traffic environment

Local Fast R-Cnn Flow For Object-Centric Event Recognition In Complex Traffic Scenes

Multi-scale feature fusion with attention mechanism for crowded road object detection

Multiclass objects detection algorithm using DarkNet-53 and DenseNet for intelligent vehicles

Lightweight Real-Time Object Detection via Enhanced Global Perception and Intra-Layer Interaction for Complex Traffic Scenarios

Small Object Detection in Traffic Scenes Based on Attention Feature Fusion

A Multi-Scale Target Detection Method Using an Improved Faster Region Convolutional Neural Network Based on Enhanced Backbone and Optimized Mechanisms

Multi-scale multi-modal fusion for object detection in autonomous driving based on selective kernel

GHAFNet: Global-context hierarchical attention fusion method for traffic object detection

An Improved Faster R-CNN for Small Object Detection

3D Object Detection Based on Attention and Multi-Scale Feature Fusion

An improved YOLOv5 method for large objects detection with multi-scale feature cross-layer fusion network

Traffic Sign Detection and Recognition Using Multi-Scale Fusion and Prime Sample Attention

Hybrid dilated multilayer faster RCNN for object detection

A Vision Enhancement and Feature Fusion Multiscale Detection Network

Road detection via a dual-task network based on cross-layer graph fusion modules

An Adaptive Attention Fusion Mechanism Convolutional Network for Object Detection in Remote Sensing Images

Voxel-RCNN-Complex: An Effective 3-D Point Cloud Object Detector for Complex Traffic Conditions

Deep multi-scale and multi-modal fusion for 3D object detection

A Multi-Scale Traffic Object Detection Algorithm for Road Scenes Based on Improved YOLOv5

A Small-Scale Object Detection Algorithm in Intelligent Transportation Scenarios