AG-YOLO: Attention-guided Network for Real-Time Object Detection

Hangyu Zhu,Libo Sun,Wenhu Qin,Feng Tian
DOI: https://doi.org/10.1007/s11042-023-16568-3
IF: 2.577
2024-01-01
Multimedia Tools and Applications
Abstract:Existing neural network models directly add attention mechanisms to the network as a plug-and-play component to capture long-range dependencies and reconstruct feature maps. However, most methods do not fully tap the potential of attention in dealing with multi-scale problems. In this paper, an attention-guided YOLOv4 network (AG-YOLO) is proposed to address the multi-scale issue in object detection. We propose and apply multi-scale feature extraction to later stages of the backbone, which can not only enrich the feature hierarchy with low computational overhead, but also model the intra-scale and inter-scale correlation simultaneously to avoid missing key information. To reduce the redundant use of information flow, we propose a lightweight attention-guided feature pyramid network, which provides an efficient multi-level aggregation strategy based on multi-scale channel attention. In addition, a global context pathway is designed to reduce the dilution of high-level semantic information caused by information transmission. Compared with the baseline, AG-YOLO increased the mAP_0.5 by 1.67%, while the number of parameters and GFLOPs merely increased by 0.33M and 0.18, respectively. Meanwhile, the detection accuracy of small object categories has been improved.
What problem does this paper attempt to address?