Abstract:Small object detection is one of the difficulties in the development of computer vision, especially in the case of complex image backgrounds, and the accuracy of small object detection still needs to be improved. In this article, we present a small object detection network based on YOLOv4, which solves some obstacles that hinder the performance of traditional methods in small object detection tasks in complex road environments, such as few effective features, the influence of image noise, and occlusion by large objects, and improves the detection of small objects in complex background situations such as drone aerial survey images. The improved network architecture reduces the computation and GPU memory consumption of the network by including the cross-stage partial network (CSPNet) structure into the spatial pyramid pool (SPP) structure in the YOLOv4 network and convolutional layers after concatenation operation. Secondly, the accuracy of the model on the small object detection task is improved by adding a more suitable small object detection head and removing one used for large object detection. Then, a new branch is added to extract feature information at a shallow location in the backbone part, and the feature information extracted from this branch is fused in the neck part to enrich the small object location information extracted by the model; when fusing feature information from different levels in the backbone, the fusion weight of useful information is increased by adding a weighting mechanism to improve detection performance at each scale. Finally, a coordinated attention (CA) module is embedded at a suitable location in the neck part, which enables the model to focus on spatial location relationships and inter-channel relationships and enhances feature representation capability. The proposed model has been tested to detect 10 different target objects in aerial images from drones and five different road traffic signal signs in images taken from vehicles in a complex road environment. The detection speed of the model meets the criteria of real-time detection, the model has better performance in terms of accuracy compared to the existing state-of-the-art detection models, and the model has only 44M parameters. On the drone aerial photography dataset, the average accuracy of YOLOv4 and YOLOv5L is 42.79% and 42.10%, respectively, while our model achieves an average accuracy (mAP) of 52.76%; on the urban road traffic light dataset, the proposed model achieves an average accuracy of 96.98%, which is also better than YOLOv4 (95.32%), YOLOv5L (94.79%) and other advanced models. The current work provides an efficient method for small object detection in complex road environments, which can be extended to scenarios involving small object detection, such as drone cruising and autonomous driving.

ST-YOLOX: a Lightweight and Accurate Object Detection Network Based on Swin Transformer

Object Detection Based on Swin Deformable Transformer-BiPAFPN-YOLOX

An Object Detection Method Based on Improved YOLOX

Swin-Transformer-Based YOLOv5 for Small-Object Detection in Remote Sensing Images

Research on Autonomous Driving Image Recognition Based on a New Real-Time Object Detection Model YOLOv5st

Swin-Transformer -YOLOv5 for lightweight hot-rolled steel strips surface defect detection algorithm

Obstacle detection: improved YOLOX-S based on swin transformer-tiny

A novel algorithm for small object detection based on YOLOv4

STAE‐YOLO: Intelligent detection algorithm for risk management of construction machinery intrusion on transmission lines based on visual perception

Remote sensing object detection based on a combination of a CNN and the Swin transformer

YOLOv4-dense: A smaller and faster YOLOv4 for real-time edge-device based object detection in traffic scene

YOLO Adaptive Developments in Complex Natural Environments for Tiny Object Detection

YOLO-TLA: An Efficient and Lightweight Small Object Detection Model based on YOLOv5

An improved YOLOv7 model based on Swin Transformer and Trident Pyramid Networks for accurate tomato detection

ST-YOLOA: a Swin-transformer-based YOLO model with an attention mechanism for SAR ship detection under complex background

YOLO_SRv2: An evolved version of YOLO_SR

Swin-Transformer-Enabled YOLOv5 with Attention Mechanism for Small Object Detection on Satellite Images

LAYN: Lightweight Multi-Scale Attention YOLOv8 Network for Small Object Detection

SwinNet: Swin Transformer drives edge-aware RGB-D and RGB-T salient object detection

SP-YOLOv8s: An Improved YOLOv8s Model for Remote Sensing Image Tiny Object Detection

Lightweight object detection algorithm for robots with improved YOLOv5